Streaming Audio is a podcast from Confluent, the team that originally built Apache Kafka. Host Tim Berglund (Senior Director of Developer Advocacy, Confluent) and guests unpack a variety of topics surrounding Apache Kafka, event stream processing, and real-time data. The show covers frequently asked questions and comments about the Confluent and Kafka ecosystems—from Kafka connectors to distributed systems, data integration, Kafka deployment, and managed Apache Kafka as a service—on Twitter, YouTube, and elsewhere. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Apache Kafka® 3.5 is here with the capability of previewing migrations between ZooKeeper clusters to KRaft mode. Follow along as Danica Fine highlights key release updates.Kafka Core:KIP-833 provides an updated timeline for KRaft.KIP-866 now is preview and allows migration from an existing ZooKeeper cluster to KRaft mode.KIP-900 introduces a way to bootstrap the KRaft controllers with SCRAM credentials.KIP-903 prevents a data loss scenario by preventing replicas with stale broker epochs from joining the ISR list. KIP-915 streamlines the process of downgrading Kafka's transaction and group coordinators by introducing tagged fields.Kafka Connect:KIP-710 provides the option to use a REST API for internal server communication that can be enabled by setting `dedicated.mode.enable.internal.rest` equal to true. KIP-875 offers support for native offset management in Kafka Connect. Connect c
15/06/2023 • 11 minutes 25 seconds
A Special Announcement from Streaming Audio
After recording 64 episodes and featuring 58 amazing guests, the Streaming Audio podcast series has amassed over 130,000 plays on YouTube in the last year. We're extremely proud of these achievements and feel that it's time to take a well-deserved break. Streaming Audio will be taking a vacation! We want to express our gratitude to you, our valued listeners, for spending 10,000 hours with us on this incredible journey.Rest assured, we will be back with more episodes! In the meantime, feel free to revisit some of our previous episodes. For instance, you can listen to Anna McDonald share her stories about the worst Apache Kafka® bugs she’s ever seen, or listen to Jun Rao offer his expert advice on running Kafka in production. And who could forget the charming backstory behind Mitch Seymour's Kafka storybook, Gently Down the Stream?These memorable episodes brought us joy, and we're thrilled to have shared them with you. As we reflect on our accom
13/04/2023 • 1 minute 18 seconds
How to use Data Contracts for Long-Term Schema Management
Have you ever struggled with managing data long term, especially as the schema changes over time? In order to manage and leverage data across an organization, it’s essential to have well-defined guidelines and standards in place around data quality, enforcement, and data transfer. To get started, Abraham Leal (Customer Success Technical Architect, Confluent) suggests that organizations associate their Apache Kafka® data with a data contract (schema). A data contract is an agreement between a service provider and data consumers. It defines the management and intended usage of data within an organization. In this episode, Abraham talks to Kris about how to use data contracts and schema enforcement to ensure long-term data management.When an organization sends and stores critical and valuable data in Kafka, more often than not it would like to leverage that data in various valuable ways for multiple business units. Kafka is particularly suited for this use case, but it can be
21/03/2023 • 57 minutes 28 seconds
How to use Python with Apache Kafka
Can you use Apache Kafka® and Python together? What’s the current state of Python support? And what are the best options to get started? In this episode, Dave Klein joins Kris to talk about all things Kafka and Python: the libraries, the tools, and the pros & cons. He also talks about the new course he just launched to support Python programmers entering the event-streaming world.Dave has been an active member of the Kafka community for many years and noticed that there were a lot of Kafka resources for Java but few for Python. So he decided to create a course to help people get started using Python and Kafka together.Historically, Java has had the most documentation, and people have often missed how good the Python support is for Kafka users. Python and Kafka are an ideal fit for machine learning applications and data engineering in general. Yet there are a lot of use cases for building, streaming, and machine learning pipelines. In fact, someone conducted a
14/03/2023 • 31 minutes 57 seconds
Next-Gen Data Modeling, Integrity, and Governance with YODA
In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale.Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets.The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data.This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats,
07/03/2023 • 55 minutes 55 seconds
Migrate Your Kafka Cluster with Minimal Downtime
Migrating Apache Kafka® clusters can be challenging, especially when moving large amounts of data while minimizing downtime. Michael Dunn (Solutions Architect, Confluent) has worked in the data space for many years, designing and managing systems to support high-volume applications. He has helped many organizations strategize, design, and implement successful Kafka cluster migrations between different environments. In this episode, Michael shares some tips about Kafka cluster migration with Kris, including the pros and cons of the different tools he recommends.Michael explains that there are many reasons why companies migrate their Kafka clusters. For example, they may want to modernize their platforms, move to a self-hosted cloud server, or consolidate clusters. He tells Kris that creating a plan and selecting the right tool before getting started is critical for reducing downtime and minimizing migration risks.The good news is that a few tools can facilitate mo
01/03/2023 • 1 hour 1 minute 30 seconds
Real-Time Data Transformation and Analytics with dbt Labs
dbt is known as being part of the Modern Data Stack for ELT processes. Being in the MDS, dbt Labs believes in having the best of breed for every part of the stack. Oftentimes folks are using an EL tool like Fivetran to pull data from the database into the warehouse, then using dbt to manage the transformations in the warehouse. Analysts can then build dashboards on top of that data, or execute tests.It’s possible for an analyst to adapt this process for use with a microservice application using Apache Kafka® and the same method to pull batch data out of each and every database; however, in this episode, Amy Chen (Partner Engineering Manager, dbt Labs) tells Kris about a better way forward for analysts willing to adopt the streaming mindset: Reusable pipelines using dbt models that immediately pull events into the warehouse and materialize as materialized views by default.dbt Labs is the company that makes and maintains dbt. dbt Core is the open-source data transform
22/02/2023 • 43 minutes 41 seconds
What is the Future of Streaming Data?
What’s the next big thing in the future of streaming data? In this episode, Greg DeMichillie (VP of Product and Solutions Marketing, Confluent) talks to Kris about the future of stream processing in environments where the value of data lies in their ability to intercept and interpret data.Greg explains that organizations typically focus on the infrastructure containers themselves, and not on the thousands of data connections that form within. When they finally realize that they don't have a way to manage the complexity of these connections, a new problem arises: how do they approach managing such complexity? That’s where Confluent and Apache Kafka® come into play - they offer a consistent way to organize this seemingly endless web of data so they don't have to face the daunting task of figuring out how to connect their shopping portals or jump through hoops trying different ETL tools on various systems.As more companies seek ways to manage this data, th
15/02/2023 • 41 minutes 29 seconds
What can Apache Kafka Developers learn from Online Gaming?
What can online gaming teach us about making large-scale event management more collaborative in real-time? Ben Gamble (Developer Relations Manager, Aiven) has come to the world of real-time event streaming from an usual source: the video games industry. And if you stop to think about it, modern online games are complex, distributed real-time data systems with decades of innovative techniques to teach us.In this episode, Ben talks with Kris about integrating gaming concepts with Apache Kafka®. Using Kafka’s state management stream processing, Ben has built systems that can handle real-time event processing at a massive scale, including interesting approaches to conflict resolution and collaboration.Building latency into a system is one way to mask data processing time. Ben says that you can efficiently hide latency issues and prioritize performance improvements by setting an initial target and then optimizing from there. If you measure before optimizing, you can
08/02/2023 • 55 minutes 32 seconds
Apache Kafka 3.4 - New Features & Improvements
Apache Kafka® 3.4 is released! In this special episode, Danica Fine (Senior Developer Advocate, Confluent), shares highlights of the Apache Kafka 3.4 release. This release introduces new KIPs in Kafka Core, Kafka Streams, and Kafka Connect.In Kafka Core:KIP-792 expands the metadata each group member passes to the group leader in its JoinGroup subscription to include the highest stable generation that consumer was a part of. KIP-830 includes a new configuration setting that allows you to disable the JMX reporter for environments where it’s not being used. KIP-854 introduces changes to clean up producer IDs more efficiently, to avoid excess memory usage. It introduces a new timeout parameter that affects the expiry of producer IDs and updates the old parameter to only affect the expiry of transaction IDs.KIP-866 (early access) provides a bridge to migrate between existing Zookeeper clusters to new KRaft mode
07/02/2023 • 5 minutes 13 seconds
How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems
How can you use OpenTelemetry to gain insight into your Apache Kafka® event systems? Roman Kolesnev, Staff Customer Innovation Engineer at Confluent, is a member of the Customer Solutions & Innovation Division Labs team working to build business-critical OpenTelemetry applications so companies can see what’s happening inside their data pipelines. In this episode, Roman joins Kris to discuss tracing and monitoring in distributed systems using OpenTelemetry. He talks about how monitoring each step of the process individually is critical to discovering potential delays or bottlenecks before they happen; including keeping track of timestamps, latency information, exceptions, and other data points that could help with troubleshooting.Tracing each request and its journey to completion in Kafka gives companies access to invaluable data that provides insight into system performance and reliability. Furthermore, using this data allows engineers to quickly identify errors or ant
01/02/2023 • 50 minutes 1 second
What is Data Democratization and Why is it Important?
Data democratization allows everyone in an organization to have access to the data they need, and the necessary tools needed to use this data effectively. In short, data democratization enables better business decisions. In this episode, Rama Ryali, a Senior IT and Data Executive, chats with Kris Jenkins about the importance of data democratization in modern systems.Rama explains that tech has unprecedented control over data and ignores basic business needs. Tech’s influence has largely gone unchecked and has led to a disconnect that often forces businesses to hire outside vendors for help turning their data into information they can use. In his role at RightData, Rama worked closely with Marketing, Sales, Customers, and Leadership to develop a no-code unified data platform that is accessible to everyone and fosters data democratization.So what is data democracy anyway? Rama explains that data democratization is the process of making data more accessibl
26/01/2023 • 47 minutes 27 seconds
Git for Data: Managing Data like Code with lakeFS
Is it possible to manage and test data like code? lakeFS is an open-source data version control tool that transforms object storage into Git-like repositories, offering teams a way to use the same workflows for code and data. In this episode, Kris sits down with guest Adi Polak, VP of DevX at Treeverse, to discuss how lakeFS can be used to facilitate better management and testing of data.At its core, lakeFS provides teams with better data management. A theoretical data engineer on a large team runs a script to delete some data, but a bug in the script accidentally deletes a lot more data than intended. Application engineers can checkout the main branch, effectively erasing their mistakes, but without a tool like lakeFS, this data engineer would be in a lot of trouble.Polak is quick to explain that lakeFS isn’t built on Git. The source code behind an application is usually a few dozen mega bytes, while lakeFS is designed to handle petabytes of data; however, it do
19/01/2023 • 30 minutes 42 seconds
Using Kafka-Leader-Election to Improve Scalability and Performance
How does leader election work in Apache Kafka®? For the past 2 ½ years, Adithya Chandra, Staff Software Engineer at Confluent, has been working on Kafka scalability and performance, specifically partition leader election. In this episode, he gives Kris Jenkins a deep dive into the power of leader election in Kafka replication, why we need it, how it works, what can go wrong, and how it's being improved.Adithya explains that you can configure a certain number of replicas to be distributed across Kafka brokers and then set one of them as the elected leader - the others become followers. This leader-based model proves efficient because clients only have to write to the leader, who handles the replication process internally.But what happens when a broker goes offline, when a replica reassignment occurs, or when a broker shuts down? Adithya explains that when these triggers occur, one of the followers becomes the elected leader, and all the other replicas take th
12/01/2023 • 51 minutes 6 seconds
Real-Time Machine Learning and Smarter AI with Data Streaming
Are bad customer experiences really just data integration problems? Can real-time data streaming and machine learning be democratized in order to deliver a better customer experience? Airy, an open-source data-streaming platform, uses Apache Kafka® to help business teams deliver better results to their customers. In this episode, Airy CEO and co-founder Steffen Hoellinger explains how his company is expanding the reach of stream-processing tools and ideas beyond the world of programmers.Airy originally built Conversational AI (chatbot) software and other customer support products for companies to engage with their customers in conversational interfaces. Asynchronous messaging created a large amount of traffic, so the company adopted Kafka to ingest and process all messages & events in real time.In 2020, the co-founders decided to open source the technology, positioning Airy as an open source app framework for conversational teams at large enterprises to inges
05/01/2023 • 38 minutes 56 seconds
The Present and Future of Stream Processing
The past year saw new trends emerge in the world of data streaming technologies, as well as some unexpected and novel use cases for Apache Kafka®. New reflections on the future of stream processing and when companies should adopt microservice architecture inspired several talks at this year’s industry conferences. In this episode, Kris is joined by his colleagues Danica Fine, Senior Developer Advocate, and Robin Moffatt, Principal Developer Advocate, for an end-of-year roundtable on this year’s developments and what they want to see in the year to come.Robin and Danica kick things off with a discussion of the year’s memorable conferences. Talk submissions for Kafka Summit London and Current 2022 featuring topics were noticeably more varied than previous years, with fewer talks focused on the basics of Kafka implementation. Many abstracts featured interesting and unusual use cases, in addition to detailed explanations on what went wrong and how others could avoid the same i
28/12/2022 • 31 minutes 19 seconds
Top 6 Worst Apache Kafka JIRA Bugs
Entomophiliac, Anna McDonald (Principal Customer Success Technical Architect, Confluent) has seen her fair share of Apache Kafka® bugs. For her annual holiday roundup of the most noteworthy Kafka bugs, Anna tells Kris Jenkins about some of the scariest, most surprising, and most enlightening corner cases that make you ask, “Ah, so that’s how it really works?”She shares a lot of interesting details about how batching works, the replication protocol, how Kafka’s networking stack dances with Linux’s one, and which is the most important Scala class to read, if you’re only going to read one.In particular, Anna gives Kris details about a bug that he’s been thinking about lately – sticky partitioner (KAFKA-10888). When a Kafka producer sends several records to the same partition at around the same time, the partition can get overloaded. As a result, if too many records get processed at once, they can get stuck causing an unbalanced workload. Anna goes on to explain that
21/12/2022 • 1 hour 10 minutes 58 seconds
Learn How Stream-Processing Works The Simplest Way Possible
Could you explain Apache Kafka® in ways that a small child could understand? When Mitch Seymour, author of Mastering Kafka Streams and ksqlDB, wanted a way to communicate the basics of Kafka and event-based stream processing, he decided to author a children’s book on the subject, but it turned into something with a far broader appeal.Mitch conceived the idea while writing a traditional manuscript for engineers and technicians interested in building stream processing applications. He wished he could explain what he was writing about to his 2-year-old daughter, and contemplated the best way to introduce the concepts in a way anyone could grasp.Four months later, he had completed the illustration book: Gently Down the Stream: A Gentle Introduction to Apache Kafka. It tells the story of a family of forest-dwelling Otters, who discover that they can use a giant river to communicate with each other. When more Otter families move into the forest, they
20/12/2022 • 31 minutes 29 seconds
Building and Designing Events and Event Streams with Apache Kafka
What are the key factors to consider when developing event-driven architecture? When properly designed, events can connect existing systems with a common language and allow data exchange in near real time. They also help reduce complexity by providing a single source of truth that eliminates the need to synchronize data between different services or applications. They enable dynamic behavior, allowing each service or application to respond quickly to changes in its environment. Using events, developers can create systems that are more reliable, responsive, and easier to maintain.In this podcast, Adam Bellemare, staff technologist at Confluent, discusses the four dimensions of events and designing event streams along with best practices, and an overview of a new course he just authored. This course, called Introduction to Designing Events and Event Streams, walks you through the process of properly designing events and event streams in any event-driven architecture
15/12/2022 • 53 minutes 6 seconds
Rethinking Apache Kafka Security and Account Management
Is there a better way to manage access to resources without compromising security? New employees need access to a variety of resources within a company's tech stack. But manually granting access can be error-prone. And when employees leave, their access must be revoked, thus potentially introducing security risks if an admin misses one. In this podcast, Kris Jenkins talks to Anuj Sawani (Security Product Manager, Confluent) about the centralized identity management system he helped build to integrate with Apache Kafka® to prevent common identity management headaches and security risks.With 12+ years of experience building cybersecurity products for enterprise companies, Anuj Sawani explains how he helped build out KIP-768 (Secured OAuth support in Kafka) that supports a unified identity mechanism that spans across cloud and on-premises (hybrid scenarios).Confluent Cloud customers wanted a single identity to access all their services. The manual process requi
08/12/2022 • 41 minutes 23 seconds
Real-time Threat Detection Using Machine Learning and Apache Kafka
Can we use machine learning to detect security threats in real-time? As organizations increasingly rely on distributed systems, it is becoming more important to analyze the traffic that passes through those systems quickly. Confluent Hackathon ’22 finalist, Géraud Dugé de Bernonville (Data Consultant, Zenika Bordeaux), shares how his team used TensorFlow (machine learning) and Neo4j (graph database) to analyze and detect network traffic data in real-time. What started as a research and development exercise turned into ZIEM, a full-blown internal project using ksqlDB to manipulate, export, and visualize data from Apache Kafka®.Géraud and his team noticed that large amounts of data passed through their network, and they were curious to see if they could detect threats as they happened. As a hackathon project, they built ZIEM, a network mapping and intrusion detection platform that quickly generates network diagrams. Using Kafka, the system captures network packets, processes
29/11/2022 • 29 minutes 18 seconds
Improving Apache Kafka Scalability and Elasticity with Tiered Storage
What happens when you need to store more than a few petabytes of data? Rittika Adhikari (Software Engineer, Confluent) discusses how her team implemented tiered storage, a method for improving the scalability and elasticity of data storage in Apache Kafka®. She also explores the motivating factors for building it in the first place: cost, performance, and manageability. Before Tiered Storage, there was no real way to retain Kafka data indefinitely. Because of the tight coupling between compute and storage, users were forced to use different tools to access cold and hot data. Additionally, the cost of re-replication was prohibitive because Kafka had to process large amounts of data rather than small hot sets.As a member of the Kafka Storage Foundations team, Rittika explains to Kris Jenkins how her team initially considered a Kafka data lake but settled on a more cost-effective method – tiered storage. With tiered storage, one tier handles elasticity and throughpu
22/11/2022 • 29 minutes 32 seconds
Decoupling with Event-Driven Architecture
In principle, data mesh architecture should liberate teams to build their systems and gather data in a distributed way, without having to explicitly coordinate. Data is the thing that can and should decouple teams, but proper implementation has its challenges.In this episode, Kris talks to Florian Albrecht (Solution Architect, Hermes Germany) about Galapagos, an open-source DevOps software tool for Apache Kafka® that Albrecht created with his team at Hermes, a German parcel delivery company. After Hermes chose Kafka to implement company-wide event-driven architecture, Albrecht’s team created rules and guidelines on how to use and really make the most out of Kafka. But the hands-off approach wasn’t leading to greater independence, so Albrecht’s team tried something different to documentation— they encoded the rules as software.This method pushed the teams to stop thinking in terms of data and to start thinking in terms of events. Previously, applications
15/11/2022 • 38 minutes 38 seconds
If Streaming Is the Answer, Why Are We Still Doing Batch?
Is real-time data streaming the future, or will batch processing always be with us? Interest in streaming data architecture is booming, but just as many teams are still happily batching away. Batch processing is still simpler to implement than stream processing, and successfully moving from batch to streaming requires a significant change to a team’s habits and processes, as well as a meaningful upfront investment. Some are even running dbt in micro batches to simulate an effect similar to streaming, without having to make the full transition. Will streaming ever fully take over?In this episode, Kris talks to a panel of industry experts with decades of experience building and implementing data systems. They discuss the state of streaming adoption today, if streaming will ever fully replace batch, and whether it even could (or should). Is micro batching the natural stepping stone between batch and streaming? Will there ever be a unified understanding on how data should be p
09/11/2022 • 43 minutes 58 seconds
Security for Real-Time Data Stream Processing with Confluent Cloud
Streaming real-time data at scale and processing it efficiently is critical to cybersecurity organizations like SecurityScorecard. Jared Smith, Senior Director of Threat Intelligence, and Brandon Brown, Senior Staff Software Engineer, Data Platform at SecurityScorecard, discuss their journey from using RabbitMQ to open-source Apache Kafka® for stream processing. As well as why turning to fully-managed Kafka on Confluent Cloud is the right choice for building real-time data pipelines at scale. SecurityScorecard mines data from dozens of digital sources to discover security risks and flaws with the potential to expose their client’ data. This includes scanning and ingesting data from a large number of ports to identify suspicious IP addresses, exposed servers, out-of-date endpoints, malware-infected devices, and other potential cyber threats for more than 12 million companies worldwide.To allow real-time stream processing for the organization, the team moved away f
03/11/2022 • 48 minutes 33 seconds
Running Apache Kafka in Production
What are some recommendations to consider when running Apache Kafka® in production? Jun Rao, one of the original Kafka creators, as well as an ongoing committer and PMC member, shares the essential wisdom he's gained from developing Kafka and dealing with a large number of Kafka use cases.Here are 6 recommendations for maximizing Kafka in production:1. Nail Down the Operational PartWhen setting up your cluster, in addition to dealing with the usual architectural issues, make sure to also invest time into alerting, monitoring, logging, and other operational concerns. Managing a distributed system can be tricky and you have to make sure that all of its parts are healthy together. This will give you a chance at catching cluster problems early, rather than after they have become full-blown crises. 2. Reason Properly About Serialization and Schemas Up FrontAt the Kafka API level, events are just bytes, which gives your application
27/10/2022 • 58 minutes 44 seconds
Build a Real Time AI Data Platform with Apache Kafka
Is it possible to build a real-time data platform without using stateful stream processing? Forecasty.ai is an artificial intelligence platform for forecasting commodity prices, imparting insights into the future valuations of raw materials for users. Nearly all AI models are batch-trained once, but precious commodities are linked to ever-fluctuating global financial markets, which require real-time insights. In this episode, Ralph Debusmann (CTO, Forecasty.ai) shares their journey of migrating from a batch machine learning platform to a real-time event streaming system with Apache Kafka® and delves into their approach to making the transition frictionless. Ralph explains that Forecasty.ai was initially built on top of batch processing, however, updating the models with batch-data syncs was costly and environmentally taxing. There was also the question of scalability—progressing from 60 commodities on offer to their eventual plan of over 200 commodities. Ralph observed that m
20/10/2022 • 37 minutes 18 seconds
Optimizing Apache JVMs for Apache Kafka
Java Virtual Machines (JVMs) impact Apache Kafka® performance in production. How can you optimize your event-streaming architectures so they process more Kafka messages using the same number of JVMs? Gil Tene (CTO and Co-Founder, Azul) delves into JVM internals and how developers and architects can use Java and optimized JVMs to make real-time data pipelines more performant and more cost effective, with use cases.Gil has deep roots in Java optimization, having started out building large data centers for parallel processing, where the goal was to get a finite set of hardware to run the largest possible number of JVMs. As the industry evolved, Gil switched his primary focus to software, and throughout the years, has gained particular expertise in garbage collection (the C4 collector) and JIT compilation. The OpenJDK distribution Gil's company Azul releases, Zulu, is widely used throughout the Java world, although Azul's Prime build version can run Kafka up to forty-pe
Apache Kafka® 3.3 is released! With over two years of development, KIP-833 marks KRaft as production ready for new AK 3.3 clusters only. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares highlights of this release, with KIPs from Kafka Core, Kafka Streams, and Kafka Connect. To reduce request overhead and simplify client-side code, KIP-709 extends the OffsetFetch API requests to accept multiple consumer group IDs. This update has three changes, including extending the wire protocol, response handling changes, and enhancing the AdminClient to use the new protocol. Log recovery is an important process that is triggered whenever a broker starts up after an unclean shutdown. And since there is no way to know the log recovery progress other than checking if the broker log is busy, KIP-831 adds metrics for the log recovery progress with `RemainingLogsToRecover` and `RemainingSegmentsToRecover`for each recovery thread. These metrics a
03/10/2022 • 6 minutes 42 seconds
Application Data Streaming with Apache Kafka and Swim
How do you set data applications in motion by running stateful business logic on streaming data? Capturing key stream processing events and cumulative statistics that necessitate real-time data assessment, migration, and visualization remains as a gap—for event-driven systems and stream processing frameworks according to Fred Patton (Developer Evangelist, Swim Inc.) In this episode, Fred explains streaming applications and how it contrasts with stream processing applications. Fred and Kris also discuss how you can use Apache Kafka® and Swim for a real-time UI for streaming data.Swim's technology facilitates relationships between streaming data from distributed sources and complex UIs, managing backpressure cumulatively, so that front ends don't get overwhelmed. They are focused on real-time, actionable insights, as opposed to those derived from historical data. Fred compares Swim's functionality to the speed layer in the Lambda architecture model, which is s
03/10/2022 • 39 minutes 10 seconds
International Podcast Day - Apache Kafka Edition | Streaming Audio Special
What’s your favorite podcast? Would you like to find some new ones? In celebration of International Podcast Day, Kris Jenkins invites 12 experts from the Apache Kafka® community to talk about their favorite podcasts. Unlike other episodes where guests educate developers and tell stories about Kafka, its surrounding technological ecosystem, or the Cloud, this special episode provides a glimpse into what these guests have learned through listening to podcasts that you might also find interesting. Through a virtual international tour, Kris chatted with Bill Bejeck (Integration Architect, Confluent), Nikoleta Verbeck (Senior Solutions Engineer, CSID, Confluent), Ben Stopford (Lead Technologist, OCTO, Confluent), Noelle Gallagher (Video Producer, Editor), Danica Fine (Senior Developer Advocate, Confluent), Tim Berglund (VP, Developer Relations, StarTree), Ben Ford (Founder and CEO, Commando Development), Jeff Bean (Group Manager, Technical Marketing, Confluent), Domenico Fioravant
30/09/2022 • 1 hour 2 minutes 22 seconds
How to Build a Reactive Event Streaming App - Coding in Motion
How do you build an event-driven application that can react to real-time data streams as they happen? Kris Jenkins (Senior Developer Advocate, Confluent) will be hosting another fun, hands-on programming workshop—Coding in Motion: Watching the River Flow, to demonstrate how you can build a reactive event streaming application with Apache Kafka®, ksqlDB using Python.As a developer advocate, Kris often speaks at conferences, and the presentation will be available on-demand through the organizer’s YouTube channel. The desire to read comments and be able to interact with the community motivated Kris to set up a real-time event streaming application that would notify him on his mobile phone. During the workshop, Kris will demonstrate the end-to-end process of using Python to process and stream data from YouTube’s REST API into a Kafka topic, analyze the data with ksqlDB, and then stream data out via Telegram. After the workshop, you’ll be able to use the recipe to build you
20/09/2022 • 1 minute 26 seconds
Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka
Processing real-time event streams enables countless use cases big and small. With a day job designing and building highly available distributed data systems, Simon Aubury (Principal Data Engineer, Thoughtworks) believes stream-processing thinking can be applied to any stream of events. In this episode, Simon shares his Confluent Hackathon ’22 winning project—a wildlife monitoring system to observe population trends over time using a Raspberry Pi, along with Apache Kafka®, Kafka Connect, ksqlDB, TensorFlow Lite, and Kibana. He used the system to count animals in his Australian backyard and perform trend analysis on the results. Simon also shares ideas on how you can use these same technologies to help with other real-world challenges.Open-source, object detection models for TensorFlow, which appropriately are collected into "model zoos," meant that Simon didn't have to provide his own object identification as part of the project, which wou
15/09/2022 • 34 minutes 7 seconds
Reddit Sentiment Analysis with Apache Kafka-Based Microservices
How do you analyze Reddit sentiment with Apache Kafka® and microservices? Bringing the fresh perspective of someone who is both new to Kafka and the industry, Shufan Liu, nascent Developer Advocate at Confluent, discusses projects he has worked on during his summer internship—a Cluster Linking extension to a conceptual data pipeline project, and a microservice-based Reddit sentiment-analysis project. Shufan demonstrates that it’s possible to quickly get up to speed with the tools in the Kafka ecosystem and to start building something productive early on in your journey.Shufan's Cluster Linking project extends a demo by Danica Fine (Senior Developer Advocate, Confluent) that uses a Kafka-based data pipeline to address the challenge of automatic houseplant watering. He discusses his contribution to the project and shares details in his blog—Data Enrichment in Existing Data P
08/09/2022 • 35 minutes 23 seconds
Capacity Planning Your Apache Kafka Cluster
How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance? When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup. A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out
30/08/2022 • 1 hour 1 minute 54 seconds
Streaming Real-Time Sporting Analytics for World Table Tennis
Reimagining a data architecture to provide real-time data flow for sporting events can be complicated, especially for organizations with as much data as World Table Tennis (WTT). Vatsan Rama (Director of IT, ITTF Group) shares why real-time data is essential in the sporting world and how his team reengineered their data system in 18 months, moving from a solely on-premises infrastructure to a cloud-native data system that uses Confluent Cloud with Apache Kafka® as its central nervous system. World Table Tennis is a business created by the International Table Tennis Federation (ITTF) to manage the official professional Table Tennis series of events and its commercial rights. World Table Tennis is also leading the sport digital transformation and commercializes its software application for real-time event scoring worldwide. Previously, ITTF scoring was processed manually with a desktop-based, on-venue results system (OVR) —an on-premises solution to process match data that calc
25/08/2022 • 34 minutes 29 seconds
Real-Time Event Distribution with Data Mesh
Inheriting software in the banking sector can be challenging. Perhaps the only thing harder is inheriting software built by a committee of banks. How do you keep it running, while improving it, refactoring it, and planning a bigger future for it? In this episode, Jean-Francois Garet (Technical Architect, Symphony) shares his experience at Symphony as he helps it evolve from an inherited, monolithic, single-tenant architecture to an event mesh for seamless event-streaming microservices. He talks about the journey they’ve taken so far, and the foundations they’ve laid for a modern data mesh.Symphony is the leading markets’ infrastructure and technology platform, which provides a full communication stack (chat, voice and video meetings, file and screen sharing) for the financial industry. Jean-Francois shares that its initial system was inherited from one of the founding institutions—and features the highest level of security to ensure confidentiality of business conversations,
18/08/2022 • 48 minutes 59 seconds
Apache Kafka Security Best Practices
Security is a primary consideration for any system design, and Apache Kafka® is no exception. Out of the box, Kafka has relatively little security enabled. Rajini Sivaram (Principal Engineer, Confluent, and co-author of “Kafka: The Definitive Guide” ) discusses how Kafka has gone from a system that included no security to providing an extensible and flexible platform for any business to build a secure messaging system. She shares considerations, important best practices, and features Kafka provides to help you design a secure modern data streaming system. In order to build a secure Kafka installation, you need to securely authenticate your users. Whether you are using Kerberos (SASL/GSSAPI), SASL/PLAIN, SCRAM, or OAUTH. Verifying your users can authenticate, and non-users can’t, is a primary requirement for any connected system.But authentication is only one part of the security story. We also need to address other areas. Kafka added support for fine-grained access con
11/08/2022 • 39 minutes 10 seconds
What Could Go Wrong with a Kafka JDBC Connector?
Java Database Connectivity (JDBC) is the Java API used to connect to a database. As one of the most popular Kafka connectors, it's important to prevent issues with your integrations. In this episode, we'll cover how a JDBC connection works, and common issues with your database connection. Why the Kafka JDBC Connector? When it comes to streaming database events into Apache Kafka®, the JDBC connector usually represents the first choice for its flexibility and the ability to support a wide variety of databases without requiring custom code. As an experienced data analyst, Francesco Tisiot (Senior Developer Advocate, Aiven) delves into his experience of streaming Kafka data pipeline with JDBC source connector and explains what could go wrong. He discusses alternative options available to avoid these problems, including the Debezium source connector for real-time change data capture. The JDBC connector is a Java API for Kafka Connect,
04/08/2022 • 41 minutes 10 seconds
Apache Kafka Networking with Confluent Cloud
Setting up a reliable cloud networking for your Apache Kafka® infrastructure can be complex. There are many factors to consider—cost, security, scalability, and availability. With immense experience building cloud-native Kafka solutions on Confluent Cloud, Justin Lee (Principal Solutions Engineer, Enterprise Solutions Engineering, Confluent) and Dennis Wittekind (Customer Success Technical Architect, Customer Success Engineering, Confluent) talk about the different networking options on Confluent Cloud, including AWS Transit Gateway, AWS, and Azure Private Link, and discuss when and why you might choose one over the other. In order to build a secure cloud-native Kafka network, you need to consider information security and compliance requirements. These requirements may vary depending on your industry, location, and regulatory environment. For example, in financial organizations, transaction data or personal identifiable information (PII) may not be accessible over the interne
28/07/2022 • 37 minutes 22 seconds
Event-Driven Systems and Agile Operations
How do the principles of chaotic, agile operations in the military apply to software development and event-driven systems? As a former Royal Marine, Ben Ford (Founder and CEO, Commando Development) is also a software developer, with many years of experience building event streaming architectures across financial services and startups. He shares principles that the military employs in chaotic conditions as well as how these can be applied to event-streaming and agile development.According to Ben, the operational side of the military is very emergent and reactive based on situations, like real-time, event-driven systems. Having spent the last five years researching, adapting, and applying these principles to technology leadership, he identifies a parallel in these concepts and operations ranging from DevOps to organizational architecture, and even when developing data streaming applications.One of the concepts Ben and Kris talk through is Colonel John Boyd’s OODA l
21/07/2022 • 53 minutes 22 seconds
Streaming Analytics and Real-Time Signal Processing with Apache Kafka
Imagine you can process and analyze real-time event streams for intelligence to mitigate cyber threats or keep soldiers constantly alerted to risks and precautions they should take based on events. In this episode, Jeffrey Needham (Senior Solutions Engineer, Advanced Technology Group, Confluent) shares use cases on how Apache Kafka® can be used for real-time signal processing to mitigate risk before it arises. He also explains the classic Kafka transactional processing defaults and the distinction between transactional and analytic processing. Jeffrey is part of the customer solutions and innovations division (CSID), which involves designing event streaming platforms and innovations to improve productivity for organizations by pushing the envelope of Kafka for real-time signal processing. What is signal intelligence? Jeffrey explains that it's not always affiliated with the military. Signal processing improves your operational or situational awareness by
14/07/2022 • 1 hour 6 minutes 33 seconds
Blockchain Data Integration with Apache Kafka
How is Apache Kafka® relevant to blockchain technology and cryptocurrency? Fotios Filacouris (Staff Solutions Engineer, Confluent) has been working with Kafka for close to five years, primarily designing architectural solutions for financial services, he also has expertise in the blockchain. In this episode, he joins Kris to discuss how blockchain and Kafka are complementary, and he also highlights some of the use cases he has seen emerging that use Kafka in conjunction with traditional, distributed ledger technology (DLT) as well as blockchain technologies. According to Fotios, Kafka and the notion of blockchain share many traits, such as immutability, replication, distribution, and the decoupling of applications. This complementary relationship means that they can function well together if you are looking to extend the functionality of a given DLT through sidechain or off-chain activities, such as analytics, integrations with traditional enterprise systems, or even the inte
To ensure safe and efficient deployment of Apache Kafka® clusters across multiple cloud providers, Confluent rolled out a large scale cluster management solution.Rashmi Prabhu (Staff Software Engineer & Eng Manager, Fleet Management Platform, Confluent) and her team have been building the Fleet Management Platform for Confluent Cloud. In this episode, she delves into what Fleet Management is, and how the cluster management service streamlines Kafka operations in the cloud while providing a seamless developer experience. When it comes to performing operations at large scale on the cloud, manual processes work well if the scenario involves only a handful of clusters. However, as a business grows, a cloud footprint may potentially scale 10x, and will require upgrades to a significantly larger cluster fleet.d. Additionally, the process should be automated, in order to accelerate feature releases while ensuring safe and mature operations. Fleet Management lets yo
30/06/2022 • 48 minutes 29 seconds
Common Apache Kafka Mistakes to Avoid
What are some of the common mistakes that you have seen with Apache Kafka® record production and consumption? Nikoleta Verbeck (Principal Solutions Architect at Professional Services, Confluent) has a role that specifically tasks her with performance tuning as well as troubleshooting Kafka installations of all kinds. Based on her field experience, she put together a comprehensive list of common issues with recommendations for building, maintaining, and improving Kafka systems that are applicable across use cases.Kris and Nikoleta begin by discussing the fact that it is common for those migrating to Kafka from other message brokers to implement too many producers, rather than the one per service. Kafka is thread safe and one producer instance can talk to multiple topics, unlike with traditional message brokers, where you may tend to use a client per topic. Monitoring is an unabashed good in any Kafka system. Nikoleta notes that it is better to monitor from the start of
23/06/2022 • 1 hour 9 minutes 43 seconds
Tips For Writing Abstracts and Speaking at Conferences
A well-written abstract is your ticket to conferences, but how do you write an excellent synopsis that will get accepted? As an experienced conference speaker, Robin Moffatt (Principal Developer Advocate, Confluent) often writes presentations that help the developer community to understand Apache Kafka® and its ecosystem. He is also the Program Committee Chair for Kafka Summit and Current 2022: The Next Generation of Kafka Summit. Having seen hundreds of conference submissions, Robin shares best practices for crafting abstracts that stand out, as well as tips for speaking at conferences. So you want to answer the call for papers? Before writing your abstract, Robin and Kris recommend identifying a topic that you are enthusiastic about, or a topic that can be useful to others. Oftentimes, attendees go to conferences to learn about a given technology, which they may not have extensive knowledge of yet—so a fundam
16/06/2022 • 48 minutes 56 seconds
How I Became a Developer Advocate
What is a developer advocate and how do you become one? In this episode, we have seasoned developer advocates, Kris Jenkins (Senior Developer Advocate, Confluent) and Danica Fine (Senior Developer Advocate, Confluent) answer the question by diving into how they got into the world of developer relations, what they enjoyed the most about their roles, and how you can become one.Developer advocacy is at the heart of a developer community—helping developers and software engineers to get the most out of a given technology by providing support in form of blog posts, podcasts, conference talks, video tutorials, meetups, and other mediums. Before stepping into the world of developer relations, both Danica and Kris were hands-on developers. While dedicating professional time, Kris also devoted personal time to supporting fellow developers, such as running local meetups, writing blogs, and organizing hackathons.While Danica found her calling after learning more about Apa
09/06/2022 • 29 minutes 48 seconds
Data Mesh Architecture: A Modern Distributed Data Model
Data mesh isn’t software you can download and install, so how do you build a data mesh? In this episode, Adam Bellemare (Staff Technologist, Office of the CTO, Confluent) discusses his data mesh proof of concept and how it can help you conceptualize the ways in which implementing a data mesh could benefit your organization.Adam begins by noting that while data mesh is a type of modern data architecture, it is only partially a technical issue. For instance, it encompasses the best way to enable various data sets to be stored and made accessible to other teams in a distributed organization. Equally, it’s also a social issue—getting the various teams in an organization to commit to publishing high-quality versions of their data and making them widely available to everyone else. Adam explains that the four data mesh concepts themselves provide the language needed to start discussing the necessary social transitions that must take place within a company to bring about a better, mo
02/06/2022 • 48 minutes 42 seconds
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools
Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.Why Use Apache Flink?The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be res
26/05/2022 • 55 minutes 55 seconds
Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB
Apache Kafka® isn’t just for day jobs according to Danica Fine (Senior Developer Advocate, Confluent). It can be used to make life easier at home, too!Building out a practical Apache Kafka® data pipeline is not always complicated—it can be simple and fun. For Danica, the idea of building a Kafka-based data pipeline sprouted with the need to monitor the water level of her plants at home. In this episode, she explains the architecture of her hardware-oriented project and discusses how she integrates, processes, and enriches data using ksqlDB and Kafka Connect, a Raspberry Pi running Confluent's Python client, and a Telegram bot. Apart from the script on the Raspberry Pi, the entire project was coded within Confluent Cloud.Danica's model Kafka pipeline begins with moisture sensors in her plants streaming data that is requested by an endless for-loop in a Python script on her Raspberry Pi. The Pi in turn connects to Kafka on Confluent Cloud, where the plant data
19/05/2022 • 33 minutes 56 seconds
Apache Kafka 3.2 - New Features & Improvements
Apache Kafka® 3.2 delivers new KIPs in three different areas of the Kafka ecosystem: Kafka Core, Kafka Streams, and Kafka Connect. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent), shares release highlights.More than half of the KIPs in the new release concern Kafka Core. KIP-704 addresses unclean leader elections by allowing for further communication between the controller and the brokers. KIP-764 takes on the problem of a large number of client connections in a short period of time during preferred leader election by adding the configuration `socket.listen.backlog.size`. KIP-784 adds an error code field to the response of the `DescribeLogDirs` API, and KIP-788 improves network traffic by allowing you to set the pool size of network threads individually per listener on Kafka brokers. Finally, in accordance with the imminent KRaft protocol, KIP-801 introduces a built-in `StandardAuthorizer` that doesn't depend on ZooKeeper. The
17/05/2022 • 6 minutes 54 seconds
Scaling Apache Kafka Clusters on Confluent Cloud ft. Ajit Yagaty and Aashish Kohli
How much can Apache Kafka® scale horizontally, and how can you automatically balance, or rebalance data to ensure optimal performance?You may require the flexibility to scale or shrink your Kafka clusters based on demand. With experience engineering cluster elasticity and capacity management features for cloud-native Kafka, Ajit Yagaty (Confluent Cloud Control Plane Engineering) and Aashish Kohli (Confluent Cloud Product Management) join Kris Jenkins in this episode to explain how the architecture of Confluent Cloud supports elasticity. Kris suggests that optimal elasticity is like water from a faucet—you should be able to quickly obtain as many resources as you need, but at the same time you don't want the slightest amount to go wasted. But how do you specify the amount of capacity by which to adjust, and how do you know when it's necessary?Aashish begins by explaining how elasticity on Confluent Cloud has come a long way since the early days of sc
11/05/2022 • 49 minutes 7 seconds
Streaming Analytics on 50M Events Per Day with Confluent Cloud at Picnic
What are useful practices for migrating a system to Apache Kafka® and Confluent Cloud, and why use Confluent to modernize your architecture?Dima Kalashnikov (Technical Lead, Picnic Technologies) is part of a small analytics platform team at Picnic, an online-only, European grocery store that processes around 45 million customer events and five million internal events daily. An underlying goal at Picnic is to try and make decisions as data-driven as possible, so Dima's team collects events on all aspects of the company—from new stock arriving at the warehouse, to customer behavior on their websites, to statistics related to delivery trucks. Data is sent to internal systems and to a data warehouse.Picnic recently migrated from their existing solution to Confluent Cloud for several reasons:Ecosystem and community: Picnic liked the tooling present in the Kafka ecosystem. Since being a small team means they aren't able to devote extra time to building
05/05/2022 • 34 minutes 41 seconds
Build a Data Streaming App with Apache Kafka and JS - Coding in Motion
Coding is inherently enjoyable and experimental. With the goal of bringing fun into programming, Kris Jenkins (Senior Developer Advocate, Confluent) hosts a new series of hands-on workshops—Coding in Motion, to teach you how to use Apache Kafka® and data streaming technologies for real-life use cases. In the first episode, Sound & Vision, Kris walks you through the end-to-end process of building a real-time, full-stack data streaming application from scratch using Kafka and JavaScript/TypeScript. During the workshop, you’ll learn to stream musical MIDI data into fully-managed Kafka using Confluent Cloud, then process and transform the raw data stream using ksqlDB. Finally, the enriched data streams will be pushed to a web server to display data in a 3D graphical visualization. Listen to Kris previews the first episode of Coding in Motion: Sound & Vision and join him in the workshop premiere to learn more. EPISODE LINKS<a href='https://
03/05/2022 • 2 minutes 3 seconds
Optimizing Apache Kafka's Internals with Its Co-Creator Jun Rao
You already know Apache Kafka® is a distributed event streaming system for setting your data in motion, but how does its internal architecture work? No one can explain Kafka’s internal architecture better than Jun Rao, one of its original creators and Co-Founder of Confluent. Jun has an in-depth understanding of Kafka that few others can claim—and he shares that with us in this episode, and in his new Kafka Internals course on Confluent Developer. One of Jun's goals in publishing the Kafka Internals course was to cover the evolution of Kafka since its initial launch. In line with that goal, he discusses the history of Kafka development, including the original thinking behind some of its design decisions, as well as how its features have been improved to better meet its key goals of durability, scalability, and real-time data. With respect to its initial design, Jun relates how Kafka was conceived from the ground up as a distributed syste
28/04/2022 • 48 minutes 54 seconds
Using Event-Driven Design with Apache Kafka Streaming Applications ft. Bobby Calderwood
What is event modeling and how does it differ from standard data modeling?In this episode of Streaming Audio, Bobby Calderwood, founder of Evident Systems and creator of oNote observes that at the dawn of the computer age, due to the fact that memory and computing power were expensive, people began to move away from time-and-narrative-oriented record-keeping systems (in the manner of a ship's log or a financial ledger) to systems based on aggregation. Such data-model systems, still dominant today, only retain the current state generated from their inputs, with the inputs themselves going lost. A converse approach to the reductive data-model system is the event-model system, which is enabled by tools like Apache Kafka®, and which effectively saves every bit of activity that the system generates. The event model actually marks a return, in a sense, to the earlier, narrative-like recording methods.To further illustrate, Bobby uses a chess example to show the disti
21/04/2022 • 51 minutes 9 seconds
Monitoring Extreme-Scale Apache Kafka Using eBPF at New Relic
New Relic runs one of the larger Apache Kafka® installations in the world, ingesting circa 125 petabytes a month, or approximately three billion data points per minute. Anton Rodriguez is the architect of the system, responsible for hundreds of clusters and thousands of clients, some of them implemented in non-standard technologies. In addition to the large volume of servers, he works with many teams, which must all work together when issues arise.Monitoring New Relic's large Kafka installation is critical and of course challenging, even for a company that itself specializes in monitoring. Specific obstacles include determining when rebalances are happening, identifying particularly old consumers, measuring consumer lag, and finding a way to observe all producing and consuming applications.One way that New Relic has improved the monitoring of its architecture is by directly consuming metrics from the Linux kernel using its new eBPF technology, which lets programs
13/04/2022 • 38 minutes 25 seconds
Confluent Platform 7.1: New Features + Updates
Confluent Platform 7.1 expands upon its already innovative features, adding improvements in key areas that benefit data consistency, allow for increased speed and scale, and enhance resilience and reliability.Previously, the Confluent Platform 7.0 release introduced Cluster Linking, which enables you to bridge on-premises and cloud clusters, among other configurations. Maintaining data quality standards across multiple environments can be challenging though. To assist with this problem, CP 7.1 adds Schema Linking, which lets you share consistent schemas across your clusters—synced in real time.Confluent for Kubernetes lets you build your own private-cloud Apache Kafka® service. Now you can enhance the global resilience of your architecture by employing to multiple regions. With the new release you can also configure custom volumes attached to Confluent deployments and you can declaratively define and manage the new Schema Links. As of this release, Confluent for Kubern
12/04/2022 • 10 minutes 1 second
Scaling an Apache Kafka Based Architecture at Therapie Clinic
Scaling Apache Kafka® can be tricky, let alone scaling a team. When he was first hired, Domenico Fioravanti of Therapie Clinic was given the challenging task of assembling a sizable tech team from scratch, while simultaneously building a scalable and decoupled architecture from the ground up. In addition, he wanted to deliver value to the company from day one. One way that Domenico ultimately accomplished these goals was by focusing on managed solutions in order to avoid large investments in engineering know-how. Another way was to deliver quickly to production by using the existing knowledge of his team.Domenico's biggest initial priority was to make a real-time reporting dashboard that collated data generated by third-party systems, such as call centers and front-of-house software solutions that managed bookings and transactions. (Before Domenico's arrival, all reporting had been done by aggregating data from different sources through an expensive, manual, error-p
07/04/2022 • 1 hour 10 minutes 56 seconds
Bridging Frontend and Backend with GraphQL and Apache Kafka ft. Gerard Klijs
What is GraphQL? And how can you combine GraphQL with Apache Kafka® to query data in real time?With over 10 years of experience as a backend engineer, Gerard Klijs is a Confluent Community Catalyst, a contributor to several GraphQL libraries, and also a creator and maintainer of a Rust library to use Confluent Schema Registry with Java client. In this episode, he explains why you want to use Kafka with GraphQL and how they work together to bridge the gap between backend and frontend to make data more easily accessible in the frontend. As an alternative to REST, GraphQL is an open source programming language developed by Meta, which lets you pull data from multiple data sources via a single API call. GraphQL lets you migrate and deprecate data easily. For example, if you have a `name` field, which you later decided to replace by `firstName` and `lastName`, you can group the field names together and monitor the server for query requests. If there are no additional query
29/03/2022 • 23 minutes 13 seconds
Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole
Data availability, usability, integrity, and security are words that we sometimes hear a lot. But what do they actually look like when put into practice? That’s where data governance comes in. This becomes especially tricky when working with real-time data architectures.Tushar Thole (Senior Manager, Engineering, Trust & Security, Confluent) focuses on delivering features for software-defined storage, software-defined networking (SD-WAN), security, and cloud-native domains. In this episode, he shares the importance of real-time data governance and the product portfolio—Stream Governance, which his team has been building to fostering the collaboration and knowledge sharing necessary to become an event-centric business while remaining compliant within an ever-evolving landscape of data regulations. With the increase of data volume, variety, and velocity, data governance is mandatory for trustworthy, usable, accurate, and accessible data across organizations, especiall
22/03/2022 • 42 minutes 58 seconds
Handling 2 Million Apache Kafka Messages Per Second at Honeycomb
How many messages can Apache Kafka® process per second? At Honeycomb, it's easily over one million messages. In this episode, get a taste of how Honeycomb uses Kafka on massive scale. Liz Fong-Jones (Principal Developer Advocate, Honeycomb) explains how Honeycomb manages Kafka-based telemetry ingestion pipelines and scales Kafka clusters. And what is Honeycomb? Honeycomb is an observability platform that helps you visualize, analyze, and improve cloud application quality and performance. Their data volume has grown by a factor of 10 throughout the pandemic, while the total cost of ownership has only gone up by 20%. But how, you ask? As a developer advocate for site reliability engineering (SRE) and observability, Liz works alongside the platform engineering team on optimizing infrastructure for reliability and cost. Two years ago, the team was facing the prospect of growing from 20 Kafka brokers to 200 Kafka brokers as data volume increased. The cha
15/03/2022 • 41 minutes 36 seconds
Why Data Mesh? ft. Ben Stopford
With experience in data infrastructure and distributed data technologies, author of the book “Designing Event-Driven Systems” Ben Stopford (Lead Technologist, Office of the CTO, Confluent) explains the data mesh paradigm, differences between traditional data warehouses and microservices, as well as how you can get started with data mesh. Unlike standard data architecture, data mesh is about moving data away from a monolithic data warehouse into distributed data systems. Doing so will allow data to be available as a product—this is also one of the four principles of data mesh: Data ownership by domainData as a productData available everywhere for self-serviceData governed wherever it isThese four principles are technology agnostic, which means that they don’t restrict you to a programming language, Apache Kafka®, or other databases. Data mesh is all about building point-to-point architecture that lets you evolve and accommo
10/03/2022 • 44 minutes 42 seconds
Serverless Stream Processing with Apache Kafka ft. Bill Bejeck
What is serverless?Having worked as a software engineer for over 15 years and as a regular contributor to Kafka Streams, Bill Bejeck (Integration Architect, Confluent) is an Apache Kafka® committer and author of “Kafka Streams in Action.” In today’s episode, he explains what serverless and the architectural concepts behind it are. To clarify, serverless doesn’t mean you can run an application without a server—there are still servers in the architecture, but they are abstracted away from your application development. In other words, you can focus on building and running applications and services without any concerns over infrastructure management. Using a cloud provider such as Amazon Web Services (AWS) enables you to allocate machine resources on demand while handling provisioning, maintenance, and scaling of the server infrastructure. There are a few important terms to know when implementing serverless functions with event stream processors:
03/03/2022 • 42 minutes 23 seconds
The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps
When it comes to Apache Kafka®, there’s no one better to tell the story than Jay Kreps (Co-Founder and CEO, Confluent), one of the original creators of Kafka. In this episode, he talks about the evolution of Kafka from in-house infrastructure to a managed cloud service and discusses what’s next for infrastructure engineers who used to self-manage the workload. Kafka started out at LinkedIn as a distributed stream processing framework and was core to their central data pipeline. At the time, the challenge was to address scalability for real-time data feeds. The social media platform’s initial data system was built on Apache™Hadoop®, but the team later realized that operationalizing and scaling the system required a considerable amount of work. When they started re-engineering the infrastructure, Jay observed a big gap in data streaming—on one end, data was being looked at constantly for analytics, while on the other end, data was being looked at once a day—missing real-
24/02/2022 • 46 minutes 32 seconds
What’s Next for the Streaming Audio Podcast ft. Kris Jenkins
Meet your new host of the Streaming Audio podcast: Kris Jenkins (Senior Developer Advocate, Confluent)! In this preview, Kris shares a few highlights from forthcoming episodes to look forward to, spanning topics from data mesh, cloud-native technologies, and serverless Apache Kafka®, to data modeling. As a developer advocate, Kris is endlessly fascinated about software design, functional programming, real-time systems, and electronic music. He is a veteran software developer and engineer, with a broad background from roles such as CTO of a Java/Oracle gold exchange and contract developer of several Haskell/PureScript-based event systems.There is still a raft of data streaming narratives to tell and many community experts to feature. We’ll cover what’s new and emerging, real-life Kafka use cases, and how people are currently using managed Kafka as a service, as well as the latest in the data streaming spaceIf there’s a subject you’d like to see covered on the sho
16/02/2022 • 2 minutes 39 seconds
On to the Next Chapter ft. Tim Berglund
After nearly 200 podcast episodes of Streaming Audio, Tim Berglund bids farewell in his last episode as host of the show. Tim reflects on the many great memories with guests who have appeared on the segment—and each for its own reasons. He has covered a wide variety of topics, ranging from Apache Kafka® fundamentals, microservices, event stream processing, use cases, to cloud-native Kafka, data mesh, and more. As Tim mentions, the Streaming Audio podcast will continue on to explore all things about Kafka and the cloud while featuring new voices and topics. You can subscribe to the Streaming Audio podcast on your podcast platform of choice to get the latest updates and news. Thank you for listening and stay tuned. EPISODE LINKSI Interviewed Nearly 200 Apache Kafka Experts and I learned These 10 Things<a href='https://www.youtube.com/watch?v=Rax
03/02/2022 • 6 minutes 45 seconds
Intro to Event Sourcing with Apache Kafka ft. Anna McDonald
What is event sourcing and how does it work?Event sourcing is often used interchangeably with event-driven architecture and event stream processing. However, Anna McDonald (Principal Customer Success Technical Architect, Confluent) explains it's a specific category of its own—an event streaming pattern. Anna is passionate about event-driven architectures and event patterns. She’s a tour de force in the Apache Kafka® community and is the presenter of the Event Sourcing and Event Storage with Apache Kafka course on Confluent Developer. In this episode, she previews the course by providing an overview of what event sourcing is and what you need to know in order to build event-driven systems. Event sourcing is an architectural design pattern, which defines the approach to handling data operations that are driven by a sequence of events. The pattern ensures that all changes to an application state are captured and stored as an immutable sequence of events, known
01/02/2022 • 30 minutes 14 seconds
Expanding Apache Kafka Multi-Tenancy for Cloud-Native Systems ft. Anna Povzner and Anastasia Vela
In an effort to make Apache Kafka® cloud native, Anna Povzener (Principal Engineer, Confluent) and Anastasia Vela (Software Engineer I, Confluent) have been working to expand multi-tenancy to cloud-native systems with automated capacity planning and scaling in Confluent Cloud. They explain how cloud-native data systems are different from legacy databases and share the technical requirements needed to create multi-tenancy for managed Kafka as a service. As a distributed system, Kafka is designed to support multi-tenant systems by: Isolating data with authentication, authorization, and encryptionIsolating user namespacesIsolating performance with quotasTraditionally, Kafka’s multi-tenant capabilities are used in on-premises data centers to make data available and accessible across the company—a single company would run a multi-tenant Kafka cluster with all its workloads to stream data across organizations. Some processes behind
27/01/2022 • 31 minutes 1 second
Apache Kafka 3.1 - Overview of Latest Features, Updates, and KIPs
Apache Kafka® 3.1 is here with exciting new features and improvements! On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares release highlights that you won’t want to miss, including foreign-key joins in Kafka Streams and improvements that will provide consistency for Kafka latency metrics. KAFKA-13439 deprecates the eager protocol, which has been the default since Kafka 2.4—it’s advised to upgrade your applications to the cooperative protocol as the eager protocol will no longer be supported in future releases. Previously, foreign-key joins in Kafka Streams only worked if both primary and foreign-key tables were joined. This release adds support for foreign-key joins on tables with custom partitioners, which will be passed in as part of a new `TableJoined` object, comparable to the existing `Joined` and `StreamJoined` objects. With the goal of making Kafka more intuitive, KIP-773 enhances naming consistency for three new client me
24/01/2022 • 4 minutes 43 seconds
Optimizing Cloud-Native Apache Kafka Performance ft. Alok Nikhil and Adithya Chandra
Maximizing cloud Apache Kafka® performance isn’t just about running data processes on cloud instances. There is a lot of engineering work required to set and maintain a high-performance standard for speed and availability. Alok Nikhil (Senior Software Engineer, Confluent) and Adithya Chandra (Staff Software Engineer II, Confluent) share about their efforts on how to optimize Kafka on Confluent Cloud and the three guiding principles that they follow whether you are self-managing Kafka or working on a cloud-native system: Know your users and plan for their workloadsInfrastructure matters for performance as well as cost efficiency Effective observability—you can’t improve what you don’t see A large part of setting and achieving performance standards is about understanding that workloads vary and come with unique requirements. There are different dimensions for performance, such as the number of partitions and the number of connections.
20/01/2022 • 30 minutes 40 seconds
From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine
Implementing an event-driven data pipeline can be challenging, but doing so within the context of a legacy architecture is even more complex. Having spent three years building a streaming data infrastructure and being on the first team at a financial organization to implement Apache Kafka® event-driven data pipelines, Danica Fine (Senior Developer Advocate, Confluent) shares about the development process and how ksqlDB and Kafka Connect became instrumental to the implementation.By moving away from batch processing to streaming data pipelines with Kafka, data can be distributed with increased data scalability and resiliency. Kafka decouples the source from the target systems, so you can react to data as it changes while ensuring accurate data in the target system. In order to transition from monolithic micro-batching applications to real-time microservices that can integrate with a legacy system that has been around for decades, Danica and her team started developing Ka
13/01/2022 • 29 minutes 50 seconds
Real-Time Change Data Capture and Data Integration with Apache Kafka and Qlik
Getting data from a database management system (DBMS) into Apache Kafka® in real time is a subject of ongoing innovation. John Neal (Principal Solution Architect, Qlik) and Adam Mayer (Senior Technical Producer Marketing Manager, Qlik) explain how leveraging change data capture (CDC) for data ingestion into Kafka enables real-time data-driven insights. It can be challenging to ingest data in real time. It is even more challenging when you have multiple data sources, including both traditional databases and mainframes, such as SAP and Oracle. Extracting data in batch for transfer and replication purposes is slow, and often incurs significant performance penalties. However, analytical queries are often even more resource intensive and are prohibitively expensive to run on production transactional databases. CDC enables the capture of source operations as a sequence of incrementing events, converting the data into events to be written to Kafka. Once this data is available
06/01/2022 • 34 minutes 51 seconds
Modernizing Banking Architectures with Apache Kafka ft. Fotios Filacouris
It’s been said that financial services organizations have been early Apache Kafka® adopters due to the strong delivery guarantees and scalability that Kafka provides. With experience working and designing architectural solutions for financial services, Fotios Filacouris (Senior Solutions Engineer, Enterprise Solutions Engineering, Confluent) joins Tim to discuss how Kafka and Confluent help banks build modern architectures, highlighting key emerging use cases from the sector. Previously, Kafka was often viewed as a simple pipe that connected databases together, which allows for easy and scalable data migration. As the Kafka ecosystem evolves with added components like ksqlDB, Kafka Streams, and Kafka Connect, the implementation of Kafka goes beyond being just a pipe—it’s an intelligent pipe that enables real-time, actionable data insights.Fotios shares a couple of use cases showcasing how Kafka solves the problems that many banks are facing today. One of his customers
28/12/2021 • 34 minutes 59 seconds
Running Hundreds of Stream Processing Applications with Apache Kafka at Wise
What’s it like building a stream processing platform with around 300 stateful stream processing applications based on Kafka Streams? Levani Kokhreidze (Principal Engineer, Wise) shares his experience building such a platform that the business depends on for multi-currency movements across the globe. He explains how his team uses Kafka Streams for real-time money transfers at Wise, a fintech organization that facilitates international currency transfers for 11 million customers. Getting to this point and expanding the stream processing platform is not, however, without its challenges. One of the major challenges at Wise is to aggregate, join, and process real-time event streams to transfer currency instantly. To accomplish this, the Wise relies on Apache Kafka® as an event broker, as well as Kafka Streams, the accompanying Java stream processing library. Kafka Streams lets you build event-driven microservices for processing streams, which can then be deployed alongside the Kaf
21/12/2021 • 31 minutes 8 seconds
Lessons Learned From Designing Serverless Apache Kafka ft. Prachetaa Raghavan
You might call building and operating Apache Kafka® as a cloud-native data service synonymous with a serverless experience. Prachetaa Raghavan (Staff Software Developer I, Confluent) spends his days focused on this very thing. In this podcast, he shares his learnings from implementing a serverless architecture on Confluent Cloud using Kubernetes Operator. Serverless is a cloud execution model that abstracts away server management, letting you run code on a pay-per-use basis without infrastructure concerns. Confluent Cloud's major design goal was to create a serverless Kafka solution, including handling its distributed state, its performance requirements, and seamlessly operating and scaling the Kafka brokers and Zookeeper. The serverless offering is built on top of an event-driven microservices architecture that allows you to deploy services independently with your own release cadence and maintained at the team level.There are 4 subjects that help create the serve
14/12/2021 • 28 minutes 20 seconds
Using Apache Kafka as Cloud-Native Data System ft. Gwen Shapira
What does cloud native mean, and what are some design considerations when implementing cloud-native data services? Gwen Shapira (Apache Kafka® Committer and Principal Engineer II, Confluent) addresses these questions in today’s episode. She shares her learnings by discussing a series of technical papers published by her team, which explains what they’ve done to expand Kafka’s cloud-native capabilities on Confluent Cloud. Gwen leads the Cloud-Native Kafka team, which focuses on developing new features to evolve Kafka to its next stage as a fully managed cloud data platform. Turning Kafka into a self-service platform is not entirely straightforward, however, Kafka’s early day investment in elasticity, scalability, and multi-tenancy to run at a company-wide scale served as the North Star in taking Kafka to its next stage—a fully managed cloud service where users will just need to send us their workloads and everything else will magically work. Through examining modern cloud-nati
07/12/2021 • 33 minutes 57 seconds
ksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work Together ft. Simon Aubury
What is ksqlDB and how does Simon Aubury (Principal Data Engineer, Thoughtworks) use it to track down the plane that wakes his cat Snowy in the morning? Experienced in building real-time applications with ksqlDB since its genesis, Simon provides an introduction to ksqlDB by sharing some of his projects and use cases. ksqlDB is a database purpose-built for stream processing applications and lets you build real-time data streaming applications with SQL syntax. ksqlDB reduces the complexity of having to code with Java, making it easier to achieve outcomes through declarative programming, as opposed to procedural programming. Before ksqlDB, you could use the producer and consumer APIs to get data in and out of Apache Kafka®; however, when it comes to data enrichment, such as joining, filtering, mapping, and aggregating data, you would have to use the Kafka Streams API—a robust and scalable programming interface influenced by the JVM ecosystem that requires Java programming
01/12/2021 • 30 minutes 42 seconds
Explaining Stream Processing and Apache Kafka ft. Eugene Meidinger
Many of us find ourselves in the position of equipping others to use Apache Kafka® after we’ve gained an understanding of what Kafka is used for. But how do you communicate and teach others event streaming concepts effectively? As a Pluralsight instructor and business intelligence consultant, Eugene Meidinger shares tips for creating consumable training materials for conveying event streaming concepts to developers and IT administrators, who are trying to get on board with Kafka and stream processing. Eugene’s background as a database administrator (DBA) and immense knowledge of event streaming architecture and data processing shows as he reveals his learnings from years of working with Microsoft Power BI, Azure Event Hubs, data processing, and event streaming with ksqlDB and Kafka Streams. Eugene mentions the importance of understanding your audience, their pain points, and their questions, such as why was Kafka invented? Why does ksqlDB matter? It also helps to use m
23/11/2021 • 29 minutes 28 seconds
Handling Message Errors and Dead Letter Queues in Apache Kafka ft. Jason Bell
If you ever wondered what exactly dead letter queues (DLQs) are and how to use them, Jason Bell (Senior DataOps Engineer, Digitalis) has an answer for you. Dead letter queues are a feature of Kafka Connect that acts as the destination for failed messages due to errors like improper message deserialization and improper message formatting. Lots of Jason’s work is around Kafka Connect and the Kafka Streams API, and in this episode, he explains the fundamentals of dead letter queues, how to use them, and the parameters around them. For example, when deserializing an Avro message, the deserialization could fail if the message passed through is not Avro or in a value that doesn’t match the expected wire format, at which point, the message will be rerouted into the dead letter queue for reprocessing. The Apache Kafka® topic will reprocess the message with the appropriate converter and send it back onto the sink. For a JSON error message, you’ll need another JSON connector to process
16/11/2021 • 37 minutes 41 seconds
Confluent Platform 7.0: New Features + Updates
Confluent Platform 7.0 has launched and includes Apache Kafka® 3.0, plus new features introduced by KIP-630: Kafka Raft Snapshot, KIP-745: Connect API to restart connector and task, and KIP-695: Further improve Kafka Streams timestamp synchronization. Reporting from Dubai, Tim Berglund (Senior Director, Developer Advocacy, Confluent) provides a summary of new features, updates, and improvements to the 7.0 release, including the ability to create a real-time bridge from on-premises environments to the cloud with Cluster Linking. Cluster Linking allows you to create a single cluster link between multiple environments from Confluent Platform to Confluent Cloud, which is available on public clouds like AWS, Google Cloud, and Microsoft Azure, removing the need for numerous point-to-point connections. Consumers reading from a topic in one environment can read from the same topic in a different environment without risks of reprocessing or missing critical messages. This provides ope
09/11/2021 • 12 minutes 16 seconds
Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck
Kafka Streams is a native streaming library for Apache Kafka® that consumes messages from Kafka to perform operations like filtering a topic’s message and producing output back into Kafka. After working as a developer in stream processing, Bill Bejeck (Apache Kafka Committer and Integration Architect, Confluent) has found his calling in sharing knowledge and authoring his book, “Kafka Streams in Action.” As a Kafka Streams expert, Bill is also the author of the Kafka Streams 101 course on Confluent Developer, where he delves into what Kafka Streams is, how to use it, and how it works. Kafka Streams provides the abstraction over Kafka consumers and producers by minimizing administrative details like the need to code and manage frameworks required when using plain Kafka consumers and producers to process streams. Kafka Streams is declarative—you can state what you want to do, rather than how to do it. Kafka Streams leverages the KafkaConsumer protocol internally; it inherits it
04/11/2021 • 35 minutes 32 seconds
Automating Infrastructure as Code with Apache Kafka and Confluent ft. Rosemary Wang
Managing infrastructure as code (IaC) instead of using manual processes makes it easy to scale systems and minimize errors. Rosemary Wang (Developer Advocate, HashiCorp, and author of “Essential Infrastructure as Code: Patterns and Practices”) is an infrastructure engineer at heart and an aspiring software developer who is passionate about teaching patterns for infrastructure as code to simplify processes for system admins and software engineers familiar with Python, provisioning tools like Terraform, and cloud service providers. The definition of infrastructure has expanded to include anything that delivers or deploys applications. Infrastructure as software or infrastructure as configuration, according to Rosemary, are ideas grouped behind infrastructure as code—the process of automating infrastructure changes in a codified manner, which also applies to DevOps practices, including version controls, continuous integration, continuous delivery, and continuous deployment. Whet
26/10/2021 • 30 minutes 8 seconds
Getting Started with Spring for Apache Kafka ft. Viktor Gamov
What’s the distinction between the Spring Framework and Spring Boot? If you are building a car, the Spring Framework is the engine while Spring Boot gives you the vehicle that you ride in. With experience teaching and answering questions on how to use Spring and Apache Kafka® together, Viktor Gamov (Principal Developer Advocate, Kong) designed a free course on Confluent Developer and previews it in this episode. Not only this, but he also explains why the opinionated Spring Framework would be a good hero in Marvel. Spring is an ever-evolving framework that embraces modern, cloud-native technologies with cross-language options, such as Kotlin integration. Unlike its predecessors, the Spring Framework supports a modern version of Java and the requirements of the Twelve-Factor App manifesto for you to move an application between environments without changing the code. With that engine in place, Spring Boot introduces a microservices architecture. Spring Boot contains databases a
19/10/2021 • 32 minutes 44 seconds
Powering Event-Driven Architectures on Microsoft Azure with Confluent
When you order a pizza, what if you knew every step of the process from the moment it goes in the oven to being delivered to your doorstep? Event-Driven Architecture is a modern, data-driven approach that describes “events” (i.e., something that just happened). A real-time data infrastructure enables you to provide such event-driven data insights in real time. Israel Ekpo (Principal Cloud Solutions Architect, Microsoft Global Partner Solutions, Microsoft) and Alicia Moniz (Cloud Partner Solutions Architect, Confluent) discuss use cases on leveraging Confluent Cloud and Microsoft Azure to power real-time, event-driven architectures. As an Apache Kafka® community stalwart, Israel focuses on helping customers and independent software vendor (ISV) partners build solutions for the cloud and use open source databases and architecture solutions like Kafka, Kubernetes, Apache Flink, MySQL, and PostgreSQL on Microsoft Azure. He’s worked with retailers and th
14/10/2021 • 38 minutes 42 seconds
Automating DevOps for Apache Kafka and Confluent ft. Pere Urbón-Bayes
Autonomy is key in building a sustainable and motivated team, and this core principle also applies to DevOps. Building self-serve Apache Kafka® and Confluent Platform deployments require a streamlined process with unrestricted tools—a centralized processing tool that allows teams in large or mid-sized organizations to automate infrastructure changes while ensuring shared standards are met. With more than 15 years of engineering and technology consulting experience, Pere Urbón-Bayes (Senior Solution Architect, Professional Services, Confluent) built an open source solution—JulieOps—to enable a self-serve Kafka platform as a service with data governance. JulieOps is one of the first solutions available to realize self-service for Kafka and Confluent with automation. Development, operations, security teams often face hurdles when deploying Kafka. How can a user request the topics that they need for their applications? How can the operations team ensure compliance and role-based
07/10/2021 • 26 minutes 8 seconds
Intro to Kafka Connect: Core Components and Architecture ft. Robin Moffatt
Kafka Connect is a streaming integration framework between Apache Kafka® and external systems, such as databases and cloud services. With expertise in ksqlDB and Kafka Connect, Robin Moffatt (Staff Developer Advocate, Confluent) helps and supports the developer community in understanding Kafka and its ecosystem. Recently, Robin authored a Kafka Connect 101 course that will help you understand the basic concepts of Kafka Connect, its key features, and how it works.What’s Kafka Connect, and how does it work with Kafka and brokers? Robin explains that Kafka Connect is a Kafka API that runs separately from the Kafka brokers, running on its own Java virtual machine (JVM) process known as the Kafka Connect worker. Kafka Connect is essential for streaming data from different sources into Kafka and from Kafka to various targets. With Connect, you don’t have to write programs using Java and instead specify your pipeline using configuration. Kafka Connect.As a pluggable framewor
28/09/2021 • 31 minutes 18 seconds
Designing a Cluster Rollout Management System for Apache Kafka ft. Twesha Modi
As one of the top coders of her Java coding class in high school, Twesha Modi is continuing to follow her passion for computer science as a senior at Cornell University, where she has proven to be one of the top programmers. During Twesha's summer internship at Confluent, she contributed to designing a new service to automate Apache Kafka® cluster rollout management—a process that releases the latest Kafka versions to customer’s clusters in Confluent Cloud.During Twesha’s internship, she was part of the Platform team, which designed a cluster management rollout service—capable of automating cluster rollout and generating rollout plans that streamline Kafka operations in the cloud. The pre-existing manual process worked well in scenarios involving just a couple hundred clusters, but with growth and the need to upgrade a significantly larger cluster fleet to target versions in the cloud, the process needed to be automated in order to accelerate feature releases while ensur
23/09/2021 • 30 minutes 8 seconds
Apache Kafka 3.0 - Improving KRaft and an Overview of New Features
Apache Kafka® 3.0 is out! To spotlight major enhancements in this release, Tim Berglund (Apache Kafka Developer Advocate) provides a summary of what’s new in the Kafka 3.0 release from Krakow, Poland, including API changes and improvements to the early-access Kafka Raft (KRaft). KRaft is a built-in Kafka consensus mechanism that’s replacing Apache ZooKeeper going forward. It is recommended to try out new KRaft features in a development environment, as KRaft is not advised for production yet. One of the major features in Kafka 3.0 is the efficiency for KRaft controllers and brokers to store, load, and replicate snapshots into a Kafka cluster for metadata topic partitioning. The Kafka controller is now responsible for generating a Kafka producer ID in both ZooKeeper and KRaft, easing the transition from ZooKeeper to KRaft on the Kafka 3.X version line. This update also moves us closer to the ZooKeeper-to-KRaft bridge release. Additionally, this release includes metadata improve
21/09/2021 • 15 minutes 17 seconds
How to Build a Strong Developer Community with Global Engagement ft. Robin Moffatt and Ale Murray
A developer community brings people with shared interests and purpose together. The fundamental elements of a community are to gather, learn, support, and create opportunities for collaboration. A developer community is also an effective and efficient instrument for exploring and solving problems together. The power of a community is its endless advantages, from knowledge sharing to support, interesting discussions, and much more. Tim Berglund invites Ale Murray (Global Community Manager, Confluent) and Robin Moffatt (Staff Developer Advocate, Confluent) on the show to discuss the art of Q&A in a global community, share tips for building a vibrant developer community, and highlight the five strategic pillars for running a successful global community:MeetupsConferencesMVP program (e.g., Confluent Community Catalysts)Community hackathonsDigital platforms Digital platforms, such as a community Slack and forum, ofte
14/09/2021 • 35 minutes 18 seconds
What Is Data Mesh, and How Does it Work? ft. Zhamak Dehghani
The data mesh architectural paradigm shift is all about moving analytical data away from a monolithic data warehouse or data lake into a distributed architecture—allowing data to be shared for analytical purposes in real time, right at the point of origin. The idea of data mesh was introduced by Zhamak Dehghani (Director of Emerging Technologies, Thoughtworks) in 2019. Here, she provides an introduction to data mesh and the fundamental problems that it’s trying to solve. Zhamak describes that the complexity and ambition to use data have grown in today’s industry. But what is data mesh? For over half a century, we’ve been trying to democratize data to deliver value and provide better analytic insights. With the ever-growing number of distributed domain data sets, diverse information arrives in increasing volumes and with high velocity. To remove the friction and serve the requirement for data to be consumed by operational needs in various use cases, the best way is t
09/09/2021 • 34 minutes 56 seconds
Multi-Cluster Apache Kafka with Cluster Linking ft. Nikhil Bhatia
Note: This episode was recorded when Cluster Linking was in preview mode. It’s now generally available as part of the Confluent Q3 ‘21 release on August 17, 2021. Infrastructure needs to react in real time to support globally distributed events, such as cloud migration, IoT, edge data collection, and disaster recovery. To provide a seamless yet cloud-native, cross-cluster topic replication experience, Nikhil Bhatia (Principal Engineer I, Product Infrastructure, Confluent) and the team engineered a solution called Cluster Linking. Available on Confluent Cloud, Cluster Linking is an API that enables Apache Kafka® to work across multi-datacenters, making it possible to design globally available distributed systems. As industries adopt multi-cloud usage and depart from on-premises and single cluster operations, we need to rethink how clusters operate across regions
31/08/2021 • 31 minutes 4 seconds
Using Apache Kafka and ksqlDB for Data Replication at Bolt
What does a ride-hailing app that offers micromobility and food delivery services have to do with data in motion? In this episode, Ruslan Gibaiev (Data Architect, Bolt) shares about Bolt’s road to adopting Apache Kafka® and ksqlDB for stream processing to replicate data from transactional databases to analytical warehouses. Rome wasn't built overnight, nor was the adoption of Kafka and ksqlDB at Bolt. Initially, Bolt noticed the need for system standardization and replacing the unreliable query-based change data capture (CDC) process. As an experienced Kafka developer, Ruslan believed that Kafka is the solution for adopting change data capture as a company-wide event streaming solution. Persuading the team at Bolt to adopt and buy in was hard at first, but Ruslan made it possible. Eventually, the team replaced query-based CDC with log-based CDC from Debezium, built on top of Kafka. Shortly after the implementation, developers at Bolt began to see precise, correc
26/08/2021 • 29 minutes 15 seconds
Placing Apache Kafka at the Heart of a Data Revolution at Saxo Bank
Monolithic applications present challenges for organizations like Saxo Bank, including difficulties when it comes to transitioning to cloud, data efficiency, and performing data management in a regulated environment. Graham Stirling, the head of data platforms at Saxo Bank and also a self-proclaimed recovering architect on the pathway to delivery, shares his experience over the last 2.5 years as Saxo Bank placed Apache Kafka® at the heart of their company—something they call a data revolution. Before adopting Kafka, Saxo Bank encountered scalability problems. They previously relied on a centralized data engineering team, using the database as an integration point and looking to their data warehouse as the center of the analytical universe. However, this needed to evolve. For a better data strategy, Graham turned his attention towards embracing a data mesh architecture: Create a self-serve platform that enables domain teams to publish and consume data assets<l
19/08/2021 • 28 minutes 37 seconds
Advanced Stream Processing with ksqlDB ft. Michael Drogalis
ksqlDB makes it easy to read, write, process, and transform data on Apache Kafka®, the de facto event streaming platform. With simple SQL syntax, pre-built connectors, and materialized views, ksqlDB’s powerful stream processing capabilities enable you to quickly start processing real-time data at scale. But how does ksqlDB work? In this episode, Michael Drogalis (Principal Product Manager, Product Management, Confluent) previews an all-new Confluent Developer course: Inside ksqlDB, where he provides a full overview of ksqlDB’s internal architecture and delves into advanced ksqlDB features. When it comes to ksqlDB or Kafka Streams, there’s one principle to keep in mind: ksqlDB and Kafka Streams share a runtime. ksqlDB runs its SQL queries by dynamically writing Kafka Streams typologies. Leveraging Confluent Cloud makes it even easier to use ksqlDB.Once you are familiar with ksqlDB’s basic design, you’ll be able to troubleshoot problems and build real-time appli
11/08/2021 • 28 minutes 26 seconds
Minimizing Software Speciation with ksqlDB and Kafka Streams ft. Mitch Seymour
Building a large, stateful Kafka Streams application that tracks the state of each outgoing email is crucial to marketing automation tools like Mailchimp. Joining us today in this episode, Mitch Seymour, staff engineer at Mailchimp, shares how ksqlDB and Kafka Streams handle the company’s largest source of streaming data. Almost like a post office, except instead of sending physical parcels, Mailchimp sends billions of emails per day. Monitoring the state of each email can provide visibility into the core business function, and it also returns information about the health of both internal and remote message transfer agents (MTAs). Finding a way to track those MTA systems in real time is pivotal to the success of the business. Mailchimp is an early Apache Kafka® adopter that started using the technology in 2014, a time before ksqlDB, Kafka Connect, and Kafka Streams came into the picture. The stream processing applications that they were building faced many complexitie
05/08/2021 • 31 minutes 32 seconds
Collecting Data with a Custom SIEM System Built on Apache Kafka and Kafka Connect ft. Vitalii Rudenskyi
The best-informed business insights that support better decision-making begin with data collection, ahead of data processing and analytics. Enterprises nowadays are engulfed by data floods, with data sources ranging from cloud services, applications, to thousands of internal servers. The massive volume of data that organizations must process presents data ingestion challenges for many large companies. In this episode, data security engineer, Vitalli Rudenskyi, discusses the decision to replace a vendor security information and event management (SIEM) system by developing a custom solution with Apache Kafka® and Kafka Connect for a better data collection strategy.Having a data collection infrastructure layer is mission critical for Vitalii and the team in helping enterprises protect data and detect security events. Building on the base of Kafka, their custom SIEM infrastructure is configurable and designed to be able to ingest and analyze huge amounts of data, including person
27/07/2021 • 25 minutes 14 seconds
Consistent, Complete Distributed Stream Processing ft. Guozhang Wang
Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka® community, on consistency and completeness in streaming processing in Apache Kafka in order to shed light on what a reimagined, modern infrastructure looks like. In his white paper titled Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka, Guozhang covers the following topics in his paper: Streaming correctness challengesStream processing with KafkaExactly-once in Kafka StreamsFo
22/07/2021 • 29 minutes
Powering Real-Time Analytics with Apache Kafka and Rockset
Using large amounts of streaming data increasingly requires interactive, real-time analytics and dashboards—and this applies to any industry, including tech. CTO and Co-Founder of Rockset Dhruba Borthakur shares how his company uses Apache Kafka® to perform complex joins, search, and aggregations on streaming data with low latencies. The Kafka database integrations allow his team to make a cloud-native analytics database that is a fundamental piece of enterprise infrastructure. Especially in e-commerce, logistics and manufacturing apps are typically receiving over 20 million events a day. As those events roll in, it is even more critical for real-time indexing to be queried with low latencies. This way, you can build high-performing and scalable dashboards that allow your organization to use clickstream and behavioral data to inform decisions and responses to consumer behavior. Typically, the data follow these steps:Events come in from mobile or web apps, such as
15/07/2021 • 25 minutes 44 seconds
Automated Event-Driven Architectures and Microservices with Apache Kafka and SmartBear
Is it possible to have automated adoption of your event-driven architectures and microservices? The answer is yes! Alianna Inzana, product leader for API testing and virtualization at SmartBear, uses this evolutionary model to make event services reusable, functional, and strategic for both in-house needs and clients. SmartBear relies on Apache Kafka® to drive its automated microservices solutions forward through scaled architecture and adaptive workflows. Although the path to adoption may be different across use case and client requirements, it is all about maturity and API lifecycle management. As your services mature and grow, so should your event streaming architecture. The data your organization collects is no longer in a silo—rather, it has to be accessible across several events. The best architecture can handle these fluctuations. Alianna explains that although the result of these requirements is an architectural pattern, it doesn’t start that way. Instead, these autom
08/07/2021 • 29 minutes 53 seconds
Data-Driven Digitalization with Apache Kafka in the Food Industry at BAADER
Coming out of university, Patrick Neff (Data Scientist, BAADER) was used to “perfect” examples of datasets. However, he soon realized that in the real world, data is often either unavailable or unstructured. This compelled him to learn more about collecting data, analyzing it in a smart and automatic way, and exploring Apache Kafka® as a core ecosystem while at BAADER, a global provider of food processing machines. After Patrick began working with Apache Kafka in 2019, he developed several microservices with Kafka Streams and used Kafka Connect for various data analytics projects. Focused on the food value chain, Patrick’s mission is to optimize processes specifically around transportation and processing. In consulting one customer, Patrick detected an area of improvement related to animal welfare, lost revenues, unnecessary costs, and carbon dioxide emissions. He also noticed that often machines are ready to send data into the cloud, but the correct presentation and/o
29/06/2021 • 27 minutes 53 seconds
Chaos Engineering with Apache Kafka and Gremlin
The most secure clusters aren’t built on the hopes that they’ll never break. They are the clusters that are broken on purpose and with a specific goal. When organizations want to avoid systematic weaknesses, chaos engineering with Apache Kafka® is the route to go. Your system is only as reliable as its highest point of vulnerability. Patrick Brennan (Principal Architect) and Tammy Butow (Principal SRE) from Gremlin discuss how they do their own chaos engineering to manage and resolve high-severity incidents across the company. But why would an engineer break things when they would have to fix them? Brennan explains that finding weaknesses in the cloud environment helps Gremlin to:Avoid lengthy downtime when there is an issue (not if, but when)Halt lost revenue that results from service interruptionsMaintain customer satisfaction with their stream processing servicesSteer clear of burnout for the SRE team Chaos engineering is
22/06/2021 • 35 minutes 32 seconds
Boosting Security for Apache Kafka with Confluent Cloud Private Link ft. Dan LaMotte
Confluent Cloud isn’t just for public access anymore. As the requirement for security across sectors increases, so does the need for virtual private cloud (VPC) connections. It is becoming more common today to come across Apache Kafka® implementations with the latest private link connectivity option. In the past, most Confluent Cloud users were satisfied with public connectivity paths and VPC peering. However, enabling private links on the cloud is increasingly important for security across networks and even the reliability of stream processing. Dan LaMotte, who since this recording became a staff software engineer II, and his team are focused on making secure connections for customers to utilize Confluent Cloud. This is done by allowing two VPCs to connect without sharing their own private IP address space. There’s no crossover between them, and it lends itself to entirely secure connection unidirectional connectivity from customer to service provider without sharing IPs. </
15/06/2021 • 25 minutes 55 seconds
Confluent Platform 6.2 | What’s New in This Release + Updates
Based on Apache Kafka® 2.8, Confluent Platform 6.2 introduces Health+, which offers intelligent alerting, cloud-based monitoring tools, and accelerated support so that you can get notified of potential issues before they manifest as critical problems that lead to downtime and business disruption.Health+ provides ongoing, real-time analysis of performance and cluster metadata for your Confluent Platform deployment, collecting only metadata so that you can continue managing your deployment, as you see fit, with complete control.With cluster metadata being continuously analyzed, through an extensive library of expert-tested rules and algorithms, you can quickly get insights to cluster performance and spot potential problems before they occur using Health+. To ensure complete visibility, organizations can customize the types of notifications that they receive and choose to receive them via Slack, email, or webhook. Each notification that you receive is aimed at avoiding la
10/06/2021 • 9 minutes 20 seconds
Adopting OpenTelemetry in Confluent and Beyond ft. Xavier Léauté
Collecting internal, operational telemetry from Confluent Cloud services and thousands of clusters is no small feat. Stakeholders need to rely on the same data to make operational decisions. Whether it be metrics from clusters in Confluent Cloud or traces from our internal service, they all provide valuable insights not only to engineering teams but also to customers for their own operations and for business reporting needs. Traditionally, this data needs to be collected in multiple ways to satisfy all the different requirements. We leverage third-party vendors for our operational needs, which usually means deploying vendor agents or libraries in addition to our own, as we also need to collect some of the same data to expose to customers.However, this sometimes leads to discrepancies between various systems, which are often hard to reconcile and make it harder to troubleshoot issues across engineering, data science, and other teams.One of the earliest software engineer
08/06/2021 • 32 minutes 52 seconds
Running Apache Kafka Efficiently on the Cloud ft. Adithya Chandra
Focused on optimizing Apache Kafka® performance with maximized efficiency, Confluent’s Product Infrastructure team has been actively exploring opportunities for scaling out Kafka clusters. They are able to run Kafka workloads with half the typical memory usage while saving infrastructure costs, which they have tested and now safely rolled out across Confluent Cloud. After spending seven years at Amazon Web Services (AWS) working on search services and Amazon Aurora as a software engineer, Adithya Chandra decided to apply his expertise in cluster management, load balancing, elasticity, and performance of search and storage clusters to the Confluent team.Last year, Confluent shipped Tiered Storage, which moves eligible data to remote storage from a Kafka broker. As most of the data moves to remote storage, we can upgrade to better storage volumes backed by solid-state drives (SSDs). SSDs are capable of higher throughput compared to hard disk drives (HDDs), capable of fas
25/05/2021 • 38 minutes 35 seconds
Engaging Database Partials with Apache Kafka for Distributed System Consistency ft. Pat Helland
When compiling database reports using a variety of data from different systems, obtaining the right data when you need it in real time can be difficult. With cloud connectivity and distributed data pipelines, Pat Helland (Principal Architect, Salesforce) explains how to make educated partial answers when you need to use the Apache Kafka® platform. After all, you can’t get guarantees across a distance, making it critical to consider partial results.Despite best efforts, managing systems from a distance can result in lag time. The secret, according to Helland, is to anticipate these situations and have a plan for when (not if) they happen. Your outputs may be incomplete from time to time, but that doesn’t mean that there isn’t valuable information and data to be shared. Although you cannot guarantee that stream data will be available when you need it, you can gather replicas within a batch to obtain a consistent result, also known as convergence. Distributed systems of
20/05/2021 • 42 minutes 9 seconds
The Truth About ZooKeeper Removal and the KIP-500 Release in Apache Kafka ft. Jason Gustafson and Colin McCabe
Jason Gustafson and Colin McCabe, Apache Kafka® developers, discuss the project to remove ZooKeeper—now known as the KRaft (Kafka on Raft) project. A previous episode of Streaming Audio featured both developers on the podcast before the release of Apache Kafka 2.8. Now they’re back to share their progress.The KRraft code has been merged (and continues to be merged) in phases. Both developers talk about the foundational Kafka Improvement Proposals (KIPs), such as KIP-595: a Raft protocol for Kafka, and KIP-631: the quorum-based Kafka controller. The idea going into this new release was to give users a chance to try out no-ZooKeeper mode for themselves. There are a lot of exciting milestones on the way for KRaft. The next release will feature Raft snapshot support, as well as support for running with security authorizers enabled. EPISODE LINKS<a href='https://developer.confluent.io/podcast/kip-500-apache-kafka-without-zookeeper-ft-colin-mccabe-and-jas
13/05/2021 • 31 minutes 50 seconds
Resilient Edge Infrastructure for IoT Using Apache Kafka ft. Kai Waehner
What is the internet of things (IoT), and how does it relate to event streaming and Apache Kafka®? The deployment of Kafka outside the datacenter creates many new possibilities for processing data in motion and building new business cases.In this episode, Kai Waehner, field CTO and global technology advisor at Confluent, discusses the intersection of edge data infrastructure, IoT, and cloud services for Kafka. He also details how businesses get into the sticky situation of not accounting for solutions when data is running dangerously close to the edge. Air-gapped environments and strong security requirements are the norm in many edge deployments.Defining the edge for your industry depends on what sector you’re in plus the amount of data and interaction involved with your customers. The edge could lie on various points of the spectrum and carry various meanings to various people. Before you can deploy Kafka to the edge, you must first define where that edge is as it rel
04/05/2021 • 27 minutes 19 seconds
Data Management and Digital Transformation with Apache Kafka at Van Oord
Imagine if you could create a better world for future generations simply by delivering marine ingenuity. Van Oord is a Dutch family-owned company that has served as an international marine contractor for over 150 years, focusing on dredging, land infrastructure in the Netherlands, and offshore wind and oil & gas infrastructure.Real-time insights into costs spent, the progress of projects, and the performance tracking of vessels and equipment are essential for surviving as a business. Becoming a data-driven company requires that all data connected, synchronized, and visualized—in fact, truly digitized.This requires a central nervous system that supports:Legacy (monolith environment) as well as microservicesELT/ETL/streaming ETLAll types of data, including transactional, streaming, geo, machine, and (sea) survey/bathymetryMaster data/enterprise common data modelThe need for agility and speed makes it neces
29/04/2021 • 28 minutes 28 seconds
Powering Microservices Using Apache Kafka on Node.js with KafkaJS at Klarna ft. Tommy Brunn
At Klarna, Lead Engineer Tommy Brunn is building a runtime platform for developers. But outside of his professional role, he is also one of the authors of the JavaScript client for Apache Kafka® called KafkaJS, which has grown from being a niche open source project to the most downloaded Kafka client for Node.js since 2018.Using Kafka in Node.js has previously meant relying on community-contributed bindings to librdkafka, which required you to spend more of your time debugging failed builds than working on your application. With the original authors moving away from supporting the bindings, and the community only partially picking up the slack, using Kafka on NodeJS was a painful proposition.Kafka is a core part of Klarna’s microservice architecture, with hundreds of services using it to communicate among themselves. In 2017, as their engineering team was building the ecosystem of Node.js services powering the Klarna app, it was clear that the experience of working wit
22/04/2021 • 31 minutes 3 seconds
Apache Kafka 2.8 - ZooKeeper Removal Update (KIP-500) and Overview of Latest Features
Apache Kafka 2.8 is out! This release includes early access to the long-anticipated ZooKeeper removal encapsulated in KIP-500, as well as other key updates, including the addition of a Describe Cluster API, support for mutual TLS authentication on SASL_SSL listeners, exposed task configurations in the Kafka Connect REST API, the removal of a properties argument for the TopologyTestDriver, the introduction of a Kafka Streams specific uncaught exception handler, improved handling of window size in Streams, and more.EPISODE LINKSRead about what’s new in Apache Kafka 2.8Check out the Apache Kafka 2.8 release notesWatch the video version of this podcast<a href='https://www.confluent.io/community/ask-the-commu
19/04/2021 • 10 minutes 48 seconds
Connecting Azure Cosmos DB with Apache Kafka - Better Together ft. Ryan CrawCour
When building solutions for customers in Microsoft Azure, it is not uncommon to come across customers who are deeply entrenched in the Apache Kafka® ecosystem and want to continue expanding within it. Thus, figuring out how to connect Azure first-party services to this ecosystem is of the utmost importance.Ryan CrawCour is a Microsoft engineer who has been working on all things data and analytics for the past 10+ years, including building out services like Azure Cosmos DB, which is used by millions of people around the globe. More recently, Ryan has taken a customer-facing role where he gets to help customers build the best solutions possible using Microsoft Azure’s cloud platform and development tools. In one case, Ryan helped a customer leverage their existing Kafka investments and persist event messages in a durable managed database system in Azure. They chose Azure Cosmos DB, a fully managed, distributed, modern NoSQL database service as their preferred database, b
14/04/2021 • 31 minutes 59 seconds
Automated Cluster Operations in the Cloud ft. Rashmi Prabhu
If you’ve heard the term “clusters,” then you might know it refers to Confluent components and features that we run in all three major cloud providers today, including an event streaming platform based on Apache Kafka®, ksqlDB, Kafka Connect, the Kafka API, databalancers, and Kafka API services. Rashmi Prabhu, a software engineer on the Control Plane team at Confluent, has the opportunity to help govern the data plane that comprises all these clusters and enables API-driven operations on these clusters. But running operations on the cloud in a scaling organization can be time consuming, error prone, and tedious. This episode addresses manual upgrades and rolling restarts of Confluent Cloud clusters during releases, fixes, experiments, and the like, and more importantly, the progress that’s been made to switch from manual operations to an almost fully automated process. You’ll get a sneak peek into what upcoming plans to make cluster operations a fully automated process using
12/04/2021 • 24 minutes 41 seconds
Resurrecting In-Sync Replicas with Automatic Observer Promotion ft. Anna McDonald
As most developers and architects know, data always needs to be accessible no matter what happens outside of the system. This week, Tim Berglund virtually sits down with Anna McDonald (Principal Customer Success Technical Architect, Confluent) to discuss how Automatic Observer Promotion (AOP) can help solve the Apache Kafka® 2.5 datacenter dilemma as a feature now available in Confluent Platform 6.1 and above. Many industries must have a backup plan not only to do the right thing by the data that they collect but because they are regulated by law to do so. Anna has a knack for preparing operations that makes replication of data possible both synchronously and asynchronously. To avoid roadblocks in stretch clusters, she’s found that you need both a replication factor and a minimum in-sync replica (ISR). There needs to be a consideration for not just one but multiple copies for the protection of your data criteria. Not replicating the correct number on the datacenter can mean t
07/04/2021 • 24 minutes 33 seconds
Building Real-Time Data Pipelines with Microsoft Azure, Databricks, and Confluent
Processing data in real time is a process, as some might say. Angela Chu (Solution Architect, Databricks) and Caio Moreno (Senior Cloud Solution Architect, Microsoft) explain how to integrate Azure, Databricks, and Confluent to build real-time data pipelines that enable you to ingest data, perform analytics, and extract insights from data at hand. They share about where to start within the Apache Kafka® ecosystem and how to maximize the tools and components that it offers using fully managed services like Confluent Cloud for data in motion.EPISODE LINKSConsuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure Azure Data Lake Storage Gen2 introduction</
31/03/2021 • 30 minutes 32 seconds
Smooth Scaling and Uninterrupted Processing with Apache Kafka ft. Sophie Blee-Goldman
Availability in Kafka Streams is hard, especially in the face of any changes. Any change to topic metadata or group membership triggers a rebalance. But Kafka Streams struggles even after this stop-the-world rebalance has finished. According to Apache Kafka® Committer and Confluent Software Engineer Sophie Blee-Goldman, this is because a Streams app will generally have some state associated with a given partition, and to move this state from one consumer instance to another requires rebuilding this state from a special backing topic called a changelog, the source of truth for a partition’s state. Restoring the changelog can take hours, and until the state is ready, Streams can’t do any further processing on that partition. Furthermore, it can’t serve any requests for local state until the local state is “caught up” with the changelog. So scaling out your Streams application results in pretty significant downtime—which is a bummer, especially if the reason for scaling
24/03/2021 • 50 minutes 33 seconds
Event-Driven Architecture - Common Mistakes and Valuable Lessons ft. Simon Aubury
Event-driven architecture has taken on numerous meanings over the years—from event notification to event-carried state transfer, to event sourcing, and CQRS. Why has event-driven programming become so popular, and why is it such a topic of interest? For the first time, Simon Aubury (Principal Data Engineer, ThoughtWorks) joins Tim Berglund on the Streaming Audio podcast to tell all, including his own experiences adopting event-driven technologies and common blunders when working in this area.Simon admits that he’s made some mistakes and learned some valuable lessons that can benefit others. Among these are accidentally building a message bus, the idea that messages are not events, realizing that getting too fixated on the size of a microservice is the wrong problem, the importance of understanding events and boundaries, defining choreography vs. orchestration, and dealing with passive-aggressive events.This brings Simon to where he is today, as he advocates for
17/03/2021 • 42 minutes 32 seconds
The Human Side of Apache Kafka and Microservices ft. SPOUD
Many industries depend on real-time data, requiring a range of solutions that Apache Kafka® can help solve. Samuel Benz (CTO) and Patrick Bönzli (Product Owner) explain how their company, SPOUD, has fully embraced Kafka for data delivery, which has proven to be successful for SPOUD since 2016 across various industries and use cases. The four Kafka use cases that Sam and Patrick see most often are microservices, event processing, event sourcing/the data lake, and integration architecture. But implementing streaming software for each of these areas is not without its challenges. It’s easy to become frustrated by trivial problems that arise when integrating Kafka into the enterprise, because it’s not just about technology but also people and how they react to a new technology that they are not yet familiar with. Should enterprises be scared of Kafka? Why can it be hard to adopt Kafka? How do you drive Kafka adoption internally? All good questions.When adopting Kafka into
08/03/2021 • 45 minutes 11 seconds
Gamified Fitness at Synthesis Software Technologies Using Apache Kafka and IoT
Synthesis Software Technologies, a Confluent partner, is migrating an existing behavioral IoT framework into Kafka to streamline and normalize vendor information. The legacy messaging technology that they currently use has altered the behavioral IoT data space, and now Apache Kafka® will allow them to take that to the next level. New ways of normalizing the data will allow for increased efficiency for vendors, users, and manufacturers. It will also enable the scaling IoT technology going forward. Nick Walker (Principle of Streaming) and Yoni Lew (DevOps Developer) of Synthesis discuss how they utilize Confluent Platform in a personal behavior data pipeline provided by Vitality Group. Vitality Group promotes a shared-value insurance model, which sources behavioral change information and transforms it into personal incentives and rewards to members associated with their global partners.Yoni shares about the motivators of moving their data from an existing product over to
03/03/2021 • 33 minutes 32 seconds
Becoming Data Driven with Apache Kafka and Stream Processing ft. Daniel Jagielski
When it comes to adopting event-driven architectures, a couple of key considerations often arise: the way that an asynchronous core interacts with external synchronous systems and the question of “how do I refactor my monolith into services?” Daniel Jagielski, a consultant working as a tech lead/dev manager at VirtusLab for Tesco, recounts how these very themes emerged in his work with European clients. Through observing organizations as they pivot toward becoming real time and event driven, Daniel identifies the benefits of using Apache Kafka® and stream processing for auditing, integration, pub/sub, and event streaming.He describes the differences between a provisioned cluster vs. managed cluster and the importance of this within the Kafka ecosystem. Daniel also dives into the risk detection platform used by Tesco, which he helped build as a VirtusLab consultant and that marries the asynchronous and synchronous worlds.As Tesco migrated from a legacy platform t
22/02/2021 • 48 minutes 10 seconds
Integrating Spring Boot with Apache Kafka ft. Viktor Gamov
Viktor Gamov (Developer Advocate, Confluent) joins Tim Berglund on this episode to talk all about Spring and Apache Kafka®. Viktor’s main focus lately has been helping developers build their apps with stream processing, and helping them do this effectively with different languages. Viktor recently hosted an online Spring Boot workshop that turned out to be a lot of fun. This means it was time to get him back on the show to talk about this all-important framework and how it integrates with Kafka and Kafka Streams. Spring Boot enables you to do more with less. Its features offer numerous benefits, making it easy to create standalone, production-grade Spring-based applications that you can just run. The pattern also runs inside the Spring framework for a long time. The Spring Integration Framework implements many enterprise integration patterns and also has a pre-built Kafka connector.Spring Boot was highly inspired by a 12-factor app manifesto that allows you to write p
17/02/2021 • 45 minutes 8 seconds
Confluent Platform 6.1 | What’s New in This Release + Updates
Confluent Platform 6.1 further simplifies management tasks for Apache Kafka® operators. Based on Apache Kafka 2.7, this release provides even higher availability for enterprises who are using Kafka as the central backbone for their business-critical applications. Confluent Platform 6.1 delivers enhancements that reduce the risk of downtime, simplify operations and streamline the user experience, as well as improve visibility and control with centralized management.EPISODE LINKSCheck out the release notesRead the blog post: Introducing Confluent Platform 6.1Download Confluent Platform 6.1Watch the video version of this podcast<a href='https://www.confluent.io/com
10/02/2021 • 9 minutes 37 seconds
Building a Microservices Architecture with Apache Kafka at Nationwide Building Society ft. Rob Jackson
Nationwide Building Society, a financial institution in the United Kingdom with 137 years of history and over 18,000 employees, relies on Apache Kafka® for their event streaming needs. But how did this come to be? In this episode, Tim Berglund talks with Rob Jackson (Principal Architect, Nationwide) about their Kafka adoption journey as they celebrate two years in production. Nationwide chose to adopt Kafka as a central part of their information architecture in order to integrate microservices. You can't have them share a database that's design-time coupling, and maybe you tried having them call each other synchronously. There's a little bit too much runtime coupling, leading to the rise of event-driven reactive microservices as a stable and extensible architecture for the next generation.Nationwide also chose to use Kafka for the following reasons:To replace their mortgage sales systems from traditional orchestration style to event-driven
08/02/2021 • 48 minutes 54 seconds
Examining Apache Kafka Performance Metrics ft. Alok Nikhil
Coming up with an honest test built on open source tools in an easily documented, replicable environment for a distributed system like Apache Kafka® is not simple. Alok Nikhil (Cloud Native Engineer, Confluent) shares about getting Kafka in the cloud and how best to leverage Confluent Cloud for high performance and scalability. His blog post “Benchmarking Apache Kafka, Apache Pulsar, and RabbitMQ: Which is the Fastest?” discusses how Confluent tested Kafka’s performance on the latest cloud hardware using research-based methods to answer this question. Alok and Tim talk through the vendor-neutral framework OpenMessaging Benchmark used for the tests, which is Pulsar’s standardized benchmarking framework for event streaming workloads. Alok and his co-author Vinoth Chandar helped improve that framework, evaluated messaging systems in the event streaming space like RabbitMQ, and talked about improvements to those existing platforms. Later in this episode, Al
01/02/2021 • 50 minutes 30 seconds
Distributed Systems Engineering with Apache Kafka ft. Guozhang Wang
Tim Berglund picks the brain of a distributed systems engineer, Guozhang Wang, tech lead in the Streaming department of Confluent. Guozhang explains what compelled him to join the Stream Processing team at Confluent coming from the Apache Kafka® core infrastructure. He reveals what makes the best distributed systems infrastructure engineers tick and how to prepare to take on this kind of role—solving failure scenarios, a satisfying challenge. One challenge in distributed systems is achieving agreements from multiple nodes that are connected in a Kafkacluster, but the connection in practice is asynchronous.Guozhang also shares the newest updates in the Kafka community, including the coming ZooKeeper-free architecture where metadata will be maintained by Kafka logs.Prior to joining Confluent, Guozhang worked for LinkedIn, where he used Kafka for a few years before he started asking himself, “How fast can I get value from the data that I’ve collected?” This questi
25/01/2021 • 44 minutes 52 seconds
Scaling Developer Productivity with Apache Kafka ft. Mohinish Shaikh
Confluent Cloud and Confluent Platform run efficiently largely because of the dedication of the Developer Productivity (DevProd) team, formerly known as the Tools team. Mohinish Shaikh (Software Engineer, Confluent) talks to Tim Berglund about how his team builds the software tooling and automation for the entire event streaming platform and ensures seamless delivery of several engineering processes across engineering and the rest of the org. With the right tools and the right data, developer productivity can understand the overall effectiveness of a development team and their ability to produce results.The DevProd team helps engineering teams at Confluent ship code from commit to end customers actively using Apache Kafka®. This team proficiently understands a wide scope of polyglot applications and also the complexities of using a diverse technology stack on a regular basis to help solve business-critical problems for the engineering org. The team actively measures ho
20/01/2021 • 34 minutes 19 seconds
Change Data Capture and Kafka Connect on Microsoft Azure ft. Abhishek Gupta
What’s it like being a Microsoft Azure Cloud advocate working with Apache Kafka® and change data capture (CDC) solutions? Abhishek Gupta would know! At Microsoft, Abhishek focuses his time on Kafka, databases, Kubernetes, and open source projects. His experience in a wide variety of roles ranging from engineering, consulting, and product management for developer-focused products has positioned him well for developer advocacy, where he is now.Switching gears, Abhishek proceeds to break down the concept of CDC starting off with some of the core concepts such as "commit logs." Abhishek then explains how CDC can turn data around when you compare it to the traditional way of querying the database to access data—you don't call the database; it calls you. He then goes on to discuss Debezium, which is an open source change data capture solution for Kafka. He also covers Azure connectors, Azure Data Explorer, and use cases powered by the Azure Data Explorer Sink Connect
11/01/2021 • 43 minutes 4 seconds
Event Streaming Trends and Predictions for 2021 ft. Gwen Shapira, Ben Stopford, and Michael Noll
Coming out of a whirlwind year for the event streaming world, Tim Berglund sits down with Gwen Shapira (Engineering Leader, Confluent), Ben Stopford (Senior Director, Office of the CTO, Confluent), and Michael Noll (Principal Technologist, Office of the CTO, Confluent) to take a guess at what 2021 will bring. The experts share what they believe will happen for analytics, frameworks, multi-cloud services, stream processing, and other topics important to the event streaming space. These Apache Kafka® related predictions include the future of the Kafka cluster partitions and removing restrictions that users have found in the past, such as too many variations and excessive concurrency as it relates to your number of partitions.Ben also thinks that ZooKeeper will continue to maintain open source servers for highly reliable application distribution. Kafka clusters will still be able to keep the most important data while growing in size at record speed with ZooKeeper, althoug
06/01/2021 • 44 minutes 34 seconds
How to Become a Certified Apache Kafka Expert ft. Niamh O’Byrne and Barry Ballard
It’s one thing to know how to use Apache Kafka® and another to prove it to the world that you know. Niamh O’Byrne (Certification Manager, Confluent) and Barry Ballard (Senior Technical Trainer, Confluent) discuss Confluent’s Certification program, including sample test questions, bootcamp, exam details, Kafka training, and getting the necessary practical hands-on experience.It’s no secret that the entire world of work has changed, and now we expect to communicate across a vast number of digital platforms. In this new age, Barry predicts three primary skills that will become more important than ever to employers as they seek to hire a candidate:Emotional intelligenceBuilding your personal brand Digital security knowledgeWith emotional intelligence, we're really talking about effective communication and soft skills. This means understanding how to achieve consensus on utilizing digital technology, specifically Apache Kafka, which
28/12/2020 • 43 minutes 36 seconds
Mastering DevOps with Apache Kafka, Kubernetes, and Confluent Cloud ft. Rick Spurgeon and Allison Walther
How do you use Apache Kafka®, Confluent Platform, and Confluent Cloud for DevOps? Integration Architects Rick Spurgeon and Allison Walther share how, including a custom tool they’ve developed for this very purpose. First, Rick and Allison share their perspective of what it means to be a DevOps engineer. Mixing development and operations skills to deploy, manage, monitor, audit, and maintain distributed systems. DevOps is multifaceted and can be compared to glue, in which you’re stitching software, services, databases, Kafka, and more, together to integrate end to end solutions.Using the Confluent Cloud Metrics API (actionable operational metrics), you pull a wide range of metrics about your cluster, a topic or partition, bytes, records, and requests. The Metrics API is unique in that it is queryable. You can send this API question, “What's the max retained bytes per hour over 10 hours for my topic or my cluster?” and find out just like that. To make writin
22/12/2020 • 46 minutes 18 seconds
Apache Kafka 2.7 - Overview of Latest Features, Updates, and KIPs
Apache Kafka® 2.7 is here! Here are the key Kafka Improvement Proposals (KIPs) and updates in this release, presented by Tim Berglund. KIP-497 adds a new inter-broker API to alter in-sync replicas (ISRs). Every partition leader maintains the ISR list or the list of ISRs. KIP-497 is also related to the removal of ZooKeeper.KIP-599 has to do with throttling the rate of creating topics, deleting topics, and creating partitions. This KIP will add a new feature called the controller mutation rate.KIP-612 adds the ability to limit the connection creation rate on brokers, while KIP-651 supports the PEM format for SSL certificates and private keys.The release of Kafka 2.7 furthermore includes end-to-end latency metrics and sliding windows.Find out what’s new with the Kafka broker, producer, and consumer, and what’s new with Kafka Streams in today’s episode of Streaming Audio!EPISODE LINKS<a href='https://www.confluent.io/blog/apache-kafk
21/12/2020 • 10 minutes 59 seconds
Choreographing the Saga Pattern in Microservices ft. Chris Richardson
Chris Richardson, creator of the original Cloud Foundry, maintainer of microservices.io and author of “Microservices Patterns,” discovered cloud computing in 2006 during an Amazon talk about APIs for provisioning servers. At this time, you could provision 20 servers and pay 10 cents per hour. This blew his mind and led him in 2008 to create the original Cloud Foundry, a PaaS for deploying Java applications on EC2.One of the original Cloud Foundry’s earliest success stories was a digital marketing agency for a beer company that ran a campaign around the Super Bowl. Cloud Foundry actually enabled them to deploy an application on AWS and then adjust its capacity based on the load. They were leveraging the elasticity of the cloud back in the ‘08–‘09 timeframe. SpringSource eventually acquired Cloud Foundry, followed by VMware. It's the origin of the name of today's Cloud Foundry.Later in the show, Chris explains what choreographed sagas are, reasons to leve
16/12/2020 • 47 minutes 49 seconds
Apache Kafka and Porsche: Fast Cars and Fast Data ft. Sridhar Mamella
We have all heard of Porsche, but did you know that Porsche utilizes event streaming with Apache Kafka®? Today, Sridhar Mamella (Platform Manager, Data Streaming Platforms, Porsche) discusses how Kafka’s event streaming technology powers Porsche through Streamzilla.With the modern Porsche car having 150–200 sensors, Sridhar dives into what Streamzilla is and how it functions with Kafka on premises and in the cloud. He reveals how the first months of event streaming in production went, Porsche’s path to the cloud, Streamzilla's learnings from a developer and a business perspective, and plans for parts of Streamzilla to go open source.Stick around through the end as Sridhar talks through cloud migration, cloud-first strategy, and Porsche’s event streaming use cases. This Streaming Audio is all about speed—fast cars and fast data, an episode you won't want to miss!EPISODE LINKS<a href='https://a16z.com/2011/08/20/why-software-is-eating
07/12/2020 • 42 minutes 59 seconds
Tales from the Frontline of Apache Kafka DevOps ft. Jason Bell
Jason Bell (Apache Kafka® DevOps Engineer, digitalis.io, and Author of “Machine Learning: Hands-On for Developers and Technical Professionals” ) delves into his 32-year journey as a DevOps engineer and how he discovered Apache Kafka. He began his voyage in hardware technology before switching over to software development. From there, he got involved in event streaming in the early 2000s where his love for Kafka started. His first Kafka project involved monitoring Kafka clusters for flight search data, and he's been making magic ever since!Jason first learned about the power of the event streaming during Michael Noll’s talk on the streaming API in 2015. It turned out that Michael had written off 80% of Jason’s streaming API jobs with a single talk. As a Kafka DevOps engineer today, Jason works with on-prem clusters and faces challenges like instant replicas going down and bringing other developers who are new to Kafka up to speed so that they can eventuall
02/12/2020 • 1 hour 25 seconds
Multi-Tenancy in Apache Kafka ft. Anna Pozvner
Multi-tenancy has been quite the topic within the Apache Kafka® community. Anna Povzner, an engineer on the Confluent team, spends most of her time working on multi-tenancy in Kafka in Confluent Cloud.Anna kicks off the conversation with Tim Berglund (Senior Director of Developer Experience, Confluent) by explaining what multi-tenancy is, why it is worthy to be desired, and advantages over single-tenant architecture. By putting more applications and use cases on the same Kafka cluster instead of having a separate Kafka cluster for each individual application and use case, multi-tenancy helps minimize the costs of physical machines and also maintenance.She then switches gears to discuss quotas in Kafka. Quotas are essentially limits—you must set quotas for every tenant (or set up defaults) in Kafka. Anna says it’s always best to start with bandwidth quotas because they’re better understood.Stick around until the end as Anna gives us a sneak peek on what’s ahead f
23/11/2020 • 44 minutes 19 seconds
Distributed Systems Engineering with Apache Kafka ft. Roger Hoover
Roger Hoover, one of the first engineers to work on Confluent Cloud, joins Tim Berglund (Senior Director of Developer Experience, Confluent) to chat about the evolution of Confluent Cloud, all the stages that it’s been through, and the lessons he’s learned on the way. He talks through the days before Confluent Platform was created, and how he contributed to Apache Kafka® to run it on OpenStack (the feature used to separate advertised hostnames from the internal hostnames).The Confluent Cloud control plane is now run in over 40 regions. Under the covers, Roger and his team are managing tens of thousands of resources at the cloud provider layer. This means creating VPCs, VMs, volumes, and DNS records, to manage software artifacts, like what version of Kafka is running and user management. Confluent Cloud is a complex application and distributed system spread across the entire world, but Roger reveals how it's done.EPISODE LINKS<a href='https://www.conflu
18/11/2020 • 50 minutes 24 seconds
Why Kafka Streams Does Not Use Watermarks ft. Matthias J. Sax
Do you ever feel like you’re short on time? Well, good news! Confluent Software Engineer Matthias J. Sax is back to discuss how event streaming has changed the game, making time management more simple yet efficient. Matthias explains what watermarking is, the reasons behind why Kafka Streams doesn’t use them, and an alternative approach to watermarking informally called the “slack time approach.” Later, Matthias discusses how you can compare “stream time,” which is the maximum timestamp observed, to the watermark approach as a high-time watermark. Stick around for the end of the episode, where Matthias reveals other new approaches in the pipeline. Learn how to get the most out of your time on today’s episode of Streaming Audio!EPISODE LINKSKafka Summit talk: The Flux Capacitor of Kafka Streams and ksqlDB<a href='https://www.c
12/11/2020 • 52 minutes 20 seconds
Distributed Systems Engineering with Apache Kafka ft. Apurva Mehta
What's it like being a distributed systems engineer? Apurva Mehta (Engineering Leader, Confluent) explains what attracted him to Apache Kafka®, the challenges and uniqueness of distributed systems, and how to excel in this industry. He dives into the complex math behind the temporal logic of actions (TLA) and shares about his experiences working at Yahoo and Linkedin, which have prepared him to be where he is today.Apurva also shares what he looks for when hiring someone to join his team. When you're working on a system like Kafka and Kafka Streams, really understanding what your machine is doing, where the bottlenecks are, and how to design improvements to address inefficiencies is critical. EPISODE LINKSJason Gufstason discusses TLA validation (and distributed systems engineering in general) <a href='htt
02/11/2020 • 49 minutes 15 seconds
Most Terrifying Apache Kafka JIRAs of 2020 ft. Anna McDonald
It’s Halloween again, which means Anna McDonald (Staff Technical Account Manager, Confluent) is back for another spooktacular episode of Streaming Audio.In this episode, Anna shares six of the most spine-chilling, hair-raising Apache Kafka® JIRAs from the past year. Her job is to help hunt down problems like these and dig up skeletons like: Early death causes epoch time travelAttack of the clonesMissing snapshot file leads to madnessShrink inWriteLock time to avoid maiming cluster performanceOlder groups are forced to flatlineGhost segment haunts f
28/10/2020 • 51 minutes 59 seconds
Ask Confluent #18: The Toughest Questions ft. Anna McDonald
It’s the first work-from-home episode of Ask Confluent, where Gwen Shapira (Core Kafka Engineering Leader, Confluent) virtually sits down with Apache Kafka® expert Anna McDonald (Staff Technical Account Manager, Confluent) to answer questions from Twitter. Find out Anna’s favorite Kafka Improvement Proposal (KIP), which will start to use racially neutral terms in the Kafka community and in our code base, as well as answers to the following questions: If you could pick any one KIP from the backlog that hasn't yet been implemented and have it immediately available, which one would you pick?Are we able to arrive at any formula for identifying the consumer/producer throughput rate in Kafka with the given hardware specifications (CPU, RAM, network, and disk)? Does incremental cooperative rebalancing also work for general Kafka consumers in addition to Kafka Connect rebalancing?They also answer how to determine throughput and achieve
21/10/2020 • 33 minutes 46 seconds
Joining Forces with Spring Boot, Apache Kafka, and Kotlin ft. Josh Long
Wouldn’t it be awesome if there was a language as elegant as Spring Boot is as a framework? In this episode of Streaming Audio, Tim Berglund talks with Josh Long, Spring developer advocate at VMware about Kotlin, about the productivity-focused language from our friends at JetBrains, and how it works with Spring Boot to make the experience leaner, cleaner, and easy to use.Josh shares how the Spring and Kotlin teams have worked hard to make sure that Kotlin and Spring Boot are a first-class experience for all developers trying to get to production faster and safer. They also talk about the issues that arise when wrapping one set of APIs with another, as often arises in the Spring Framework: when APIs should leak, when they should not, and how not to try to be a better Kafka Streams when the original is working well enough. EPISODE LINKSJoin the Confluent Community Slack<a href='https://de
21/10/2020 • 50 minutes 41 seconds
Building an Apache Kafka Center of Excellence Within Your Organization ft. Neil Buesing
Neil Buesing, an Apache Kafka® community stalwart at Object Partners, spends his days building things out of Kafka and helping others do the same. Today, he discusses the concept of a CoE (center of excellence), and how a CoE is integral to attain and sustain world-class performance, business value, and success in a business. Neil talks us through how to make a CoE successful, the importance of event streaming, how to better understand streaming technologies, and how to best utilize CoE for your needs. This includes evangelizing Kafka, building a Proof of Value (PoV) with team members, defining deliverables as part of that CoE, and understanding how to implement Kafka into your organization. EPISODE LINKSEoS in Kafka: Listen up, I will only say this once! by Jason Gustafson The Magical Rebalance Protocol of Apache Kafka by Gwen Shapira
14/10/2020 • 46 minutes 22 seconds
Creating Your Own Kafka Improvement Proposal (KIP) as a Confluent Intern ft. Leah Thomas
Ever wonder what it's like to intern at a place like Confluent? How about working with Kafka Streams and creating your own KIP? Well, that's exactly what we discuss on today's episode with Leah Thomas. Leah Thomas, who first interned as a recruiter for Confluent, quickly realized that she was enamored with the problem solving the engineering team was doing, especially with Kafka Streams. The next time she joined Confluent's intern program, she worked on the Streams team and helped bring KIP-450 to life. With KIP-450, Leah started learning Apache Kafka® from the inside out and how to better address the user experience. She discusses her experience with getting a KIP approved with the Apache Software Foundation and how she dove into solving the problem of hopping windows with sliding windows instead.EPISODE LINKSRange: How Generalists Triumph in a Specialized World<a href='https:
07/10/2020 • 46 minutes 15 seconds
Confluent Platform 6.0 | What's New in This Release + Updates
The feature-rich release of Confluent Platform 6.0, based on Apache Kafka® 2.6, introduces Tiered Storage, Self-Balancing Clusters, ksqlDB 0.10, Admin REST APIs, and Cluster Linking in preview. These features enhance the platform with greater elasticity, improved cost effectiveness, infinite data retention, and global availability so that you can simplify management operations, reduce the cost of adopting Kafka, and focus on building event streaming applications.EPISODE LINKSConfluent Platform 6.0 Release NotesIntroducing Confluent Platform 6.0Download Confluent Platform 6.0Watch the video version of this podcast<a href='https://cnfl.io/conf
01/10/2020 • 14 minutes 11 seconds
Using Event Modeling to Architect Event-Driven Information Systems ft. Bobby Calderwood
Bobby Calderwood (Founder, Evident Systems) discusses event streaming, event modeling, and event-driven architecture. He describes the emerging visual language and process, how to effectively understand and teach what events are, and some of Bobby's own use cases in the field with oNote, Evident System’s new SaaS platform for event modeling. Finally, Bobby emphasizes the power of empowering and informing the community on how best to integrate event streaming with the outside world.EPISODE LINKSBuilding Information Systems Using Event Modeling Real-Time Payments with Clojure and Apache Kafka ft. Bobby CalderwoodEvent modeling leaders <span style='background-color: hig
30/09/2020 • 56 minutes 41 seconds
Using Apache Kafka as the Event-Driven System for 1,500 Microservices at Wix ft. Natan Silnitsky
Did you know that a team of 900 developers at Wix is using Apache Kafka® to maintain 1,500 microservices? Tim Berglund sits down with Natan Silnitsky (Backend Infrastructure Engineer, Wix) to talk all about how Wix benefits from using an event streaming platform. Wix (the website that’s made for building websites) is designing a platform that gives people the freedom to create, manage, and develop their web presence exactly the way they want as they look to move from synchronous to asynchronous messaging. In this episode, Natan and Tim talk through some of the vital lessons learned at Wix through their use of Kafka, including common infrastructure, at-least-once processing, message queuing, and monitoring. Finally, Natan gives Tim a brief overview of the open source project Greyhound and how it's being used at Wix. EPISODE LINKSgithub.com/wix/greyhound<a href='https://cnfl.io/confluent-comm
21/09/2020 • 49 minutes 12 seconds
Top 6 Things to Know About Apache Kafka ft. Gwen Shapira
This year, Confluent turns six! In honor of this milestone, we are taking a very special moment to celebrate with Gwen Shapira by highlighting the top six things everyone should know about Apache Kafka®:Clients have metricsBug fix releases/Kafka Improvement Proposals (KIPs)Idempotent producers and how they workKafka Connect is part of Kafka and Single Message Transforms (SMTs) are worth not missing out onCooperative rebalancing Generating sequence numbers and how Kafka changes the way you thinkListen as Tim and Gwen talk through the importance of Kafka Connect, cooperative rebalancing protocols, and the promise (and warning) that your data architecture will never be the same. As Gwen puts it, “Kafka gives you the options, but it's up to you how you use it.”EPISODE LINKS<a href='https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect'
15/09/2020 • 47 minutes 27 seconds
5 Years of Event Streaming and Counting ft. Gwen Shapira, Ben Stopford, and Michael Noll
With the explosion of real-time data, Apache Kafka and event stream processing (ESP) have grown in proliferation, with event streaming technology becoming the de facto technology transforming businesses across numerous verticals. Gwen Shapira (Engineering Leader, Confluent), Ben Stopford (Senior Director, OCTO, Confluent), and Michael Noll (Principal Technologist, Confluent) meet up to talk all about their last five years at Confluent and the changes they’ve seen in event streaming. They discuss what they were doing with Apache Kafka® before they arrived at Confluent, challenges in event streaming challenges that have arisen, and their favorite use cases. They then talk through what they think the Kafka community is undervaluing and where they think event streaming will be in the next five years. EPISODE LINKSTim’s Budapest Drone Footage <a href='https://developer.confluent.io/podcast/rolling-kafka
31/08/2020 • 48 minutes 18 seconds
Championing Serverless Eventing at Google Cloud ft. Jay Smith
Jay Smith helps Google Cloud users modernize their applications with serverless eventing. This helps them focus on their code instead of managing infrastructure, as well as ultra-fast deployments and reduced server costs. On today’s show, he discusses the definition of serverless, serverless eventing, data-driven vs. event-driven architecture, sources and sinks, and hybrid cloud with on-prem components. Finally, Jay shares how he sees application architecture changing in the future and where Apache Kafka® fits in.EPISODE LINKSQuine ProgramsGet Started with QwiklabsKubernetes PodcastsJoin the Confluent Community Slack<a href='https://developer.confl
26/08/2020 • 47 minutes 26 seconds
Disaster Recovery with Multi-Region Clusters in Confluent Platform ft. Anna McDonald and Mitch Henderson
Multi-Region Clusters improve high availability in Apache Kafka®, ensure cluster replication across multiple zones, and help with disaster recovery. Making sure users are successful in every area of their Kafka deployment, be it operations or application development for specific use cases, is what Anna McDonald (Team Lead Customer Success Technical Architect) and Mitch Henderson (Principal Customer Success Technical Architect) are passionate about here at Confluent.In this episode, they share common challenges that users often run into with Multi-Region Clusters, uses cases for them, and what to keep in mind when considering replication. Anna and Mitch also discuss consuming from followers, auto client failover, and offset issues to be aware of.EPISODE LINKS<a href='https://developer.confluent.io/podcast/kafka-screams-the-scariest-jiras-and-how-to-survive-them-ft-anna-mcdonald?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.ep
17/08/2020 • 43 minutes 4 seconds
Developer Advocacy (and Kafka Summit) in the Pandemic Era
All Confluent developer advocates...assemble! COVID-19 has changed the face of meetings and events, halting all in-person gatherings and forcing companies to adapt on the fly. In today's episode of Streaming Audio, the developer advocates come together to discuss how their jobs have changed during the worldwide pandemic. Less than a year ago, this group was constantly on the road or in a plane on their way to present something new about Apache Kafka and event streaming, so how has the current climate affected their work? The group talks about Zoom fatigue, online presenting, online conferences/meetups, and of course, Kafka Summit 2020. EPISODE LINKSGrowing the Event Streaming Community During COVID-19 ft. Ale Murray<l
12/08/2020 • 41 minutes 44 seconds
Apache Kafka 2.6 - Overview of Latest Features, Updates, and KIPs
Apache Kafka® 2.6 is out! This release includes progress toward removing ZooKeeper dependency, adding client quota APIs to the admin client, and exposing disk read and write metrics, and support for Java 14. In addition, there are improvements to Kafka Connect, such as allowing source connectors to set topic-specific settings for new topics and expanding Connect worker internal topic settings. Kafka 2.6 also augments metrics for Kafka Streams and adds emit-on-change support for Kafka Streams, as well as other updates. EPISODE LINKSWatch the video version of this podcastRead about what's new in Apache Kafka 2.6Join the Confluent Community Slac
06/08/2020 • 10 minutes 37 seconds
Testing ksqlDB Applications ft. Viktor Gamov
Viktor Gamov (Developer Advocate, Confluent) returns to Streaming Audio to explain the magic of ksqlDB, ideal testing environments for ksqlDB, and the ksqlDB test runner. For those who are just starting to explore the interface, Viktor provides some tips and best practices for what to look out for too. He also talks about the future of ksqlDB, the future of integration testing, and his favorite new feature among recent upgrades.EPISODE LINKSStreaming Audio episodes on ksqlDBWatch #LiveStreams with Viktor Gamov I Don't Always Test My StreamsJoin the Confluent Community Slack<a href='https://developer.confluent.io
03/08/2020 • 39 minutes 36 seconds
How to Measure the Business Value of Confluent Cloud ft. Lyndon Hedderly
As developers, we are good at envisioning the future state of any given system we want to build, but are we as good at telling the business how those changes positively impact the bottom line? Lyndon Hedderly (Team Lead, Business Value Consulting, Confluent) describes his approach to business value, how to justify a new technology that you’re introducing to your company, and tips on adopting new technologies and processes effectively. As Lyndon walks through each part of the business value framework: (1) baseline, (2) target state, (3) quantified benefits, (4) unquantified benefits, and (5) proof points, you’ll learn about cost effectiveness with Confluent Cloud, how to measure ROI vs. TCO, and a retail example from a customer that details their implementation of an event streaming platform.EPISODE LINKSMeasuring the Cost Effectiveness of Confluent Cloud <a h
27/07/2020 • 54 minutes 29 seconds
Modernizing Inventory Management Technology ft. Sina Sojoodi and Rohit Kelapure
Inventory management systems are crucial for reducing real-time inventory data drift, improving customer experience, and minimizing out-of-stock events. Apache Kafka®’s real-time data technology provides seamless inventory tracking at scale, saving billions of dollars in the supply chain, making modernized data architectures more important to retailers now more than ever. In this episode, we’ll discuss how Apache Kafka allows the implementation of stateful event streaming architectures on a cloud-native platform for application and architecture modernization. Sina Sojoodi (Global CTO, Data and Architecture, VMware) and Rohit Kelapure (Principal Advisor, VMware) will discuss data modeling, as well as the architecture design needed to achieve data consistency and correctness while handling the scale and resilience needs of a major retailer in near real time. The implemented solution utilizes Spring Boot, Kafka Streams, and Apache Cassandra, and they explain the p
20/07/2020 • 41 minutes 32 seconds
Fault Tolerance and High Availability in Kafka Streams and ksqlDB ft. Matthias J. Sax
Apache Kafka® Committer and PMC member Matthias J. Sax explains fault tolerance, high-availability stream processing, and how it’s done in Kafka Streams. He discusses the differences between changelogging vs. checkpointing and the complexities checkpointing introduces. From there, Matthias explains what hot standbys are and how they are used in Kafka Streams, why Kafka Streams doesn’t do watermarking, and finally, why Kafka Streams is a library and not infrastructure. EPISODE LINKSAsk Confluent #7: Kafka Consumers and Streams Failover Explained ft. Matthias SaxAsk Confluent #8: Guozhang Wang on Kafka Streams Standby Tasks<a href='https://developer.confluent.io/podcast/how-to-run-kafka-streams-on-kubernete
15/07/2020 • 54 minutes 3 seconds
Benchmarking Apache Kafka Latency at the 99th Percentile ft. Anna Povzner
Real-time stock trades, GPS location, and website click tracking are just a few industries that heavily rely on Apache Kafka®'s real-time messaging and data delivery functions. As such, Kafka's latency is incredibly important.Anna Povzner (Software Engineer, Confluent) gives you the breakdown and everything you need to know when it comes to measuring latency. The five components of latency are produce time, publish time, commit time, catch-up time, and fetch time. When consumer pulling adds to latency, Anna shares some best practices to keep in mind for how to think about partitioning in conjunction with latency. She also discusses client configuration in the cloud, interesting problems she's helped solve for customers, and her top two tips for debugging latency. EPISODE LINKS<a href='https://www.confluent.io/blog/configure-kafka-to-minimize-latency/?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.99th-percentile-laten
08/07/2020 • 46 minutes 30 seconds
Open Source Workflow Automation with Apache Kafka ft. Bernd Ruecker
What started out as a consulting company, Camunda eventually turned into a developer-friendly, open source vendor that now focuses on workflow automation. Bernd Ruecker, a co-founder and the chief technologist at Camunda, talks through the company's journey, how he ended up in open source, and all things automation, including how it differs from business process management and the issue of diagrams. Bernd also dives into dead letter topics in Apache Kafka®, software interacting with software, orchestration tension, and best practices for approaching challenges that pop up along the way. This episode will take you through a thorough introduction of Camunda Cloud, a cloud-native workflow engine, as well as Camunda’s Kafka connector. EPISODE LINKSJay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyftzeebe.io<a href
29/06/2020 • 43 minutes 3 seconds
Growing the Event Streaming Community During COVID-19 ft. Ale Murray
We've all been affected by COVID-19 in one way or another, resulting in big changes in workplace functionality, productivity, and even our relationships within the Apache Kafka® and Confluent communities as meetings and events have needed to turn virtual. Ale Murray (Global Community Manager, Confluent) shares interesting trends, changes in community metrics, and what we’ve done to adapt as a response. Ale also explains what makes a comprehensive community program and the value of community meetups in light of the pandemic. Despite how much we miss in-person interactions, by digitizing events and focusing on the community, we saw great growth in attendance and engagement across our Slack community, online hackathons, MVP program, and online meetups over the last couple of months, proving that nothing can stop this amazing community from thriving.EPISODE LINKS<a href='https://www.confluent.io/community/?utm_source=buzzsprout&utm_medium=podcas
24/06/2020 • 40 minutes 19 seconds
From Monolith to Microservices with Sam Newman
Author Sam Newman catches up with Tim Berglund (Senior Director of Developer Advocacy, Confluent) in the virtual studio on what microservices are, how they work, the drawbacks of microservices, what splitting the monolith looks like, and patterns to look for. The pair talk through Sam's book “Monolith to Microservices” chapter by chapter, looking at key components of microservices in more detail. Sam also walks through database decomposition, integrating with new technology, and performing joins in event streaming architecture. Lastly, Sam shares what he’s excited for in the future, which includes “Monolith to Microservices Volume II.”EPISODE LINKSMonolith to MicroservicesJoin the Confluent Community Slack<a href='https://developer.confluent.io/?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.epi
17/06/2020 • 40 minutes 27 seconds
Exploring Event Streaming Use Cases with µKanren ft. Tim Baldridge
Tim Baldridge (Senior Software Engineer, Cisco) joins us on Streaming Audio to talk about event streaming, stream processing use cases, and µKanren. First, Tim shares about his work at Cisco related to intaking viruses, the backend, and finding new ways to process data. Later, Tim talks about interesting bank and airline use cases, as well as his time at Walmart, taking a closer look at specific retail use cases and the product that Walmart used to process data streams. If you’re curious about what µKanren is, how it relates to relational programming, the complex math that goes into the workflow of µKanren, and how Apache Kafka® holds up to all other event streaming platforms, Tim also dives into that too. EPISODE LINKSµKanren: A Minimal Functional Core for Relational ProgrammingIt's Actors All The Way Down <a hr
08/06/2020 • 51 minutes
Introducing JSON and Protobuf Support ft. David Araujo and Tushar Thole
Confluent Platform 5.5 introduces long-awaited JSON Schema and Protobuf support in Confluent Schema Registry and across other platform components. Support for Protobuf and JSON Schema in Schema Registry provides the same assurances of data compatibility and consistency we already had with Avro, while opening up Kafka to more businesses, applications, and use cases that are built upon those data serialization formats. Tushar Thole (Engineering Leader, Confluent) and David Araujo (Product Manager, Confluent) share about these new improvements to Confluent Schema Registry, the differences between Apache Avro™, Protobuf, and JSON Schemas, how to treat optional fields, some of the arguments between Avro and Protobuf, and why it took some time for Schema Registry to support JSON Schemas and Protobuf.Later, they talk about custom plugins, adding another layer of safety in Confluent Platform 5.5, and their vision for data governance.EPISODE LINKS<a hr
01/06/2020 • 40 minutes
Scaling Apache Kafka in Retail with Microservices ft. Matt Simpson from Boden
Apache Kafka® is a powerful toolset for microservice architectures. In this podcast, we’ll cover how Boden, an online retail company that specializes in high-end fashion linked to the royal family, used streaming microservices to modernize their business. Matt Simpson (Solutions Architect, Boden) shares a real life use case showing how Kafka has helped Boden digitize their business, transitioning from catalogs to online sales, tracking stock, and identifying buying patterns. Matt also shares about what he's learned through using Kafka as well as the challenges of being a product master. And lastly, what is Matt excited for for the future of Boden? Find out in this episode!EPISODE LINKSDigital Transformation in Style: How Boden Innovates Retail Using Apache KafkaLearn about Boden<a href='https://www.buz
27/05/2020 • 42 minutes 1 second
Connecting Snowflake and Apache Kafka ft. Isaac Kunen
Isaac Kunen (Senior Product Manager, Snowflake) and Tim Berglund (Senior Director of Developer Advocacy, Confluent) practice social distancing by meeting up in the virtual studio to discuss all things Apache Kafka® and Kafka Connect at Snowflake. Isaac shares what Snowflake is, what it accomplishes, and his experience with developing connectors. The pair discuss the Snowflake Kafka Connector and some of the unique challenges and adaptations it has had to undergo, as well as the interesting history behind the connector. In addition, Isaac talks about how they’re taking on event streaming at Snowflake by implementing the Kafka connector and what he hopes to see in the future with Kafka releases. EPISODE LINKSDownload the Snowflake Kafka Connector<
20/05/2020 • 31 minutes 46 seconds
AMA with Tim Berglund | Streaming Audio Special
Happy 100th episode of Streaming Audio! Thank you to everyone who has listened, subscribed, left a review, and mostly, for sharing our passion for event streaming. We can't wait for the next 100! To celebrate, Ben Stopford (Senior Director of the Office of the CTO, Confluent) hosts an AMA (ask me anything) with Tim, covering 62 questions in total—from his career, his time at Confluent, Marvel vs. DC, and what he looks for in a new hire, to how to nail your next conference talk. We hope you enjoy this special 100th episode of Streaming Audio: a podcast about Apache Kafka®, Confluent, and the cloud.EPISODE LINKSThe Song of the Strange AsceticAvoiding Lock-In<a href='https://www.confluent.io/blog/author/ben-stopford/?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.ama-with-tim-berglund
18/05/2020 • 47 minutes 9 seconds
Kubernetes Meets Apache Kafka ft. Kelsey Hightower
Kelsey Hightower was already an advocate, just like all other developers, long before joining Google officially as a developer advocate and Kubernetes expert. Gaining trust in your product, process, and the way you develop code requires the ability to explain those things well. Kelsey reflects on the journey that brought him to where he is today and how Kubernetes has evolved over the years too, including what makes Kubernetes so successful. But Tim is not the only one with questions. Kelsey asks a few of his own: does Apache Kafka® want to be a database? Does Kafka want to be a system of record? Is there overlap between Kubernetes and Kafka? Can you run Kafka on Kubernetes?EPISODE LINKSKubernetes the Hard WayJoin the Confluent Community Slack<a href='https://developer.confluent.io/?utm_source=buzzsprout&utm_
13/05/2020 • 42 minutes 2 seconds
Apache Kafka Fundamentals: The Concept of Streams and Tables ft. Michael Noll
If you’ve ever wondered what Apache Kafka® is, what it’s used for, or wanted to learn about Kafka architecture and all its components, buckle up! In today’s episode, Michael Noll (Principal Technologist, Confluent) and Tim Berglund (Senior Director of Developer Advocacy, Confluent) discuss a series of fundamental questions: What is Kafka? What is an event? How do we organize and store events? And what is Kafka Streams? Over the course of this episode, Michael covers an in-depth look into Kafka technology and core concepts: the process of reading from a topic, differences between tables and streams, mutability, and what ksqlDB is and what its event streaming database features accomplish. If you've ever wanted to get a better grasp on how Kafka works, this episode is for you!EPISODE LINKS<a href='https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.kafka-fundamentals-co
04/05/2020 • 48 minutes 52 seconds
IoT Integration and Real-Time Data Correlation with Kafka Connect and Kafka Streams ft. Kai Waehner
There are two primary industries within the Internet of Things (IoT): industrial IoT (IIoT) and consumer IoT (CIoT), both of which can benefit from the Apache Kafka® ecosystem, including Kafka Streams and Kafka Connect. Kai Waehner, who works in the advanced tech group at Confluent with customers, defining their needs, use cases, and architecture, shares example use cases where he’s seen IoT integration in action. He specifically focuses on Walmart and its real-time customer integration using the Walmart app. Kafka Streams helps fine-tune the Walmart app, optimizing the user experience, offering a seamless omni-channel experience, and contributing to business success. Other topics discussed in today’s episode include integration from various legacy and modern IoT data sources, latency sensitivity, machine learning for quality control and predictive maintenance, and when event streaming can be more useful than traditional databases or data lakes.EPISODE LINKS<li
29/04/2020 • 40 minutes 55 seconds
Confluent Platform 5.5 | What's New in This Release + Updates
Confluent Platform 5.5 is out, and Tim Berglund (Senior Director of Developer Advocacy, Confluent) is here to give you the latest updates! The first is improved schema management and Confluent Schema Registry support for Protobuf and JSON, making these components pluggable. The second is better support for languages other than Java within the sphere of librdkafka. And finally, this release includes an upgrade to ksqlDB, which expands its functionality, supports more data types, increases availability for pull queries, and adds a new aggregate function.EPISODE LINKSConfluent Platform 5.5 Release Notes<a href='https://confluent.io/blog/introducing-confluent-platform-5-5?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.confluent-platform-5-5_t
24/04/2020 • 11 minutes 20 seconds
Making Abstract Algebra Count in the World of Event Streaming ft. Sam Ritchie
During his time at Twitter, Sam Ritchie (Staff Research Engineer, Google) led the development of Summingbird, a project that helped Twitter ingest and process massive amounts of data. It relieved some key pain points, saving developers at Twitter from doing work twice, as was a natural consequence of the then-current Lambda Architecture. In this episode, Sam dives teaches us some abstract algebra and explains how it has informed his attempts to make stream processing programs easy to write in a more general way.EPISODE LINKSCheck out SummingbirdJoin the Confluent Community SlackLearn about Kafka at Confluent Developer
22/04/2020 • 46 minutes 21 seconds
Apache Kafka 2.5 – Overview of Latest Features, Updates, and KIPs
Apache Kafka® 2.5 is here, and we’ve got some Kafka Improvement Proposals (KIPs) to discuss! Tim Berglund (Senior Director of Developer Advocacy, Confluent) shares improvements and changes to over 10 KIPs all within the realm of Core Kafka, Kafka Connect, and Kafka Streams, including foundational improvements to exactly once semantics, the ability to track a connector ’s active topics, and adding a new co-group operator to the Streams DSL.EPISODE LINKSCheck out the Apache Kafka 2.5 release notesRead about what’s new in Apache Kafka 2.5Watch the video version of this podcastJo
16/04/2020 • 10 minutes 28 seconds
Streaming Data Integration – Where Development Meets Deployment ft. James Urquhart
Applications, development, deployment, and theory are all key pieces behind customer experience, event streaming, and improving systems and integration. James Urquhart (Global Field CTO, VMware) is writing a book combining Wardley Mapping and Promise Theory to evaluate the future of event streaming and how it will become a more economic choice for users. James argues that reducing the cost of integration does not deter people from buying but instead encourages creativity to find more uses for integration. He stresses the importance of user experience and how knowing what users are going through helps mend products and workflows, which improves systems that bring economic value. The two then go into explanations around the Promise Theory, Jevons Paradox, and Geoffrey Moore's Core vs. Context Theory. EPISODE LINKSPromise Theory: Principles and Applications<li
15/04/2020 • 55 minutes 2 seconds
How to Run Kafka Streams on Kubernetes ft. Viktor Gamov
There’s something about YAML and the word “Docker” that doesn’t quite sit well with Viktor Gamov (Developer Advocate, Confluent). But Kafka Streams on Kubernetes is a phrase that does.Kubernetes is an open source platform that allows teams to deploy, manage, and automate containerized services and workloads. Running Kafka Streams on Kubernetes simplifies operations and gets your environment allocated faster.Viktor describes what that process looks like and how Jib helps build, test, and deploy Kafka Streams applications on Kubernetes for an improved DevOps experience. He also shares about some exciting projects he’s currently working on. EPISODE LINKSInstalling Apache Kafka® with Ansible ft. Viktor Gamov and Justin Manchester<a href='https://confluent.buzzsprout.com/186154/987419-containerized-kafka-on-kube
06/04/2020 • 41 minutes 49 seconds
Cloud Marketplace Considerations with Dan Rosanova
As the fundamental data abstractions used by developers have changed over time, event streams are now the present and the future. Coming from decades of experience in messaging, Dan Rosanova (Senior Group Product Manager for Confluent Cloud, Confluent) discusses the pros and cons of cloud event streaming services on Google Cloud Platform (GCP), Microsoft Azure, and Confluent Cloud. He also compares major stream processing and messaging services: Cloud Pub/Sub vs. Azure Event Hubs vs. Confluent Cloud, and outlines major differences among them. Also on the table in today’s episode are cloud lock-in, the anxieties around it, and where cloud marketplaces are headed.EPISODE LINKSDon’t Get Locked Up in Avoiding Lock-InJoin the Confluent Community Slack<a href='https://www.confluent.io/confluent-cloud/?utm_source=buzzsprout&
30/03/2020 • 33 minutes 31 seconds
Explore, Expand, and Extract with 3X Thinking ft. Kent Beck
As a programmer, Kent Beck chats about various topics of broad interest to developers, including some of his books: “Extreme Programming Explained: Embrace Change,” “Test-Driven Development: By Example,” and “Implementation Patterns.” He wrote “Implementation Patterns” to highlight the positive habits a developer should form in order to write accessible code. He also shares about what it’s like to experiment with new ideas and implement them, especially when others doubt what you're trying to achieve. This relates to the concept behind the explore-to-expand transition and a short piece he wrote titled "Idea to Impact." Finally, Tim and Kent talk through the difference between refactoring and tidying, Kent's involvement with agile software and test-driven development, and what exactly test-commit-revert is. And yes, they talk a little bit about event streaming too!EPISODE LINKS<a href='https://www.amazon.com/Extreme-Programming-Explained-
25/03/2020 • 54 minutes 45 seconds
Ask Confluent #17: The “What is Apache Kafka?” Episode ft. Tim Berglund
Ask Confluent is back! From questions on Apache Kafka®, data integration, and log aggregation, to potential interview questions that Tim would ask if he were to interview himself, anything goes. If you're already a Kafka expert (or any type of expert), think about becoming a speaker. Gwen and Tim talk through how to submit a proposal and get accepted to conferences. As experienced conference goers, they explain that what makes a successful talk is making sure you present for the attendee instead of making it about yourself. In essence, what can your idea or code do to help someone else? From there, the pair chat about the secret for a long marriage, REST Proxy and where it exists in Confluent Operator, how Kafka relates to Splunk when aggregating logs, and whether Tim can start making some use case based video content so that people can better understand Kafka and how it works. For those who have just started integrating Kafka, Tim and Gwen also provide some point
24/03/2020 • 25 minutes 35 seconds
Domain-Driven Design and Apache Kafka with Paul Rayner
Domain-driven design (DDD) is helpful for managing complex processes and rules—especially those between business experts and developers/users—and turning them into models. CEO of Virtual Genius Paul Rayner describes how the vast tooling in DDD enables developers to focus on the coding that really matters and makes systems more collaborative, taking into account three primary considerations: (1) how to get better at collaborating, (2) strategic design and understanding why design really matters, and (3) modeling codes. He also touches on bounded context, microservices, event storming, event sourcing, and the relationship between Apache Kafka® and DDD. EPISODE LINKSWhat is Domain-Driven Design?<a href='https://www.confluent.io/blog/microservices-apache-kafka-domain-driven-design?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.ddd-and-kafka_type.community_content.micros
18/03/2020 • 50 minutes 42 seconds
Machine Learning with TensorFlow and Apache Kafka ft. Chris Mattmann
TensorFlow is an open source machine learning platform that can be used with Apache Kafka® for deep learning. Chris Mattmann, author of Machine Learning with TensorFlow, introduces us to TensorFlow as a Google technology that teaches computers how to think and make connections like humans do. For example, when there is a signifier that the mind processes, out comes a label to the object in front of you. TensorFlow is Google's version of wrangling various technologies to help group them together and work smoothly as large amounts of data flow through. Chris also breaks down neural networks, how technology simulates cerebral processes that take place when our visual cortex receives a new image, plus a use case that involves Apache Kafka and event streaming to achieve TensorFlow's goals.EPISODE LINKSAsk Confluent #13: Machine Learning with Kai
11/03/2020 • 53 minutes 6 seconds
Distributed Systems Engineering with Apache Kafka ft. Gwen Shapira
As an engineering leader managing a team, Gwen Shapira talks through the steps she took to get to Confluent and how she got started working with Apache Kafka®. She shares about what it's like being on the Project Management Committee (PMC) for the Apache Software Foundation as well as some of the responsibilities involved, such as choosing Kafka Improvement Proposals (KIPs), monitoring releases, and making contributions to the community. For Gwen, part of finding Kafka was her willingness to take risks, learn all types of code bases, and leave companies for a new technology that showed promise and sparked her interest. Given that not only Kafka itself but also how people learn Kafka has changed, Gwen shares her best tips for approaching the project. There are differences between distributed systems engineers and full stack engineers, and for anyone who wants to work at a company like Confluent, it’s important to showcase design and architecture knowledge, a kna
04/03/2020 • 48 minutes 26 seconds
Towards Successful Apache Kafka Implementations ft. Jakub Korab
Whether it's stream processing, real-time data analytics, to adding business value, Professional Services helps customers thrive within their chosen software or products and ultimately be successful as a digital enterprise. As a solutions architect and member of the Professional Services Team at Confluent, Jakub Korab discusses what Professional Services actually is and how it relates to customer success. It all centers around what customers want to do, and you’ll hear about trends, Apache Kafka® use cases, and real-life examples of Professional Services in action within various industries over the last year.EPISODE LINKSUnderstanding Message Brokers by Jakub KorabApache Camel Developer's Cookbook by Jakub Korab<a href='https://www.c
26/02/2020 • 55 minutes 3 seconds
Knative 101: Kubernetes and Serverless Explained with Jacques Chester
What is Knative and how does it simplify Kubernetes-related processes through seamless extension? Jacques Chester (Software Engineer, VMware) is publishing a book called “Knative in Action” that walks through the problems Knative is trying to solve. You don’t need to be an expert to fully understand Knative, so start getting hands on and see what you can do with it! You also don't need to be an expert on Kubernetes to read the book, but some experience with the tool can help you get it working with your software more quickly. This episode will help you understand the relationship between Knative and serverless and simplify your Kubernetes cluster.EPISODE LINKSLearn more about Knative<a href='https://books.google.com/books?id=TfcWAAAAQBAJ&printsec=frontcover&dq=factory+physics&hl=en&newbks=1&newbks_redir=0&sa=X&ved=2ahUKEwjz6Jf-4NHnAhUSJzQIHaDzDFIQ6AEwAHoECAUQAg#v=onepage&q=factory%20p
19/02/2020 • 47 minutes 13 seconds
Paving a Data Highway with Kafka Connect ft. Liz Bennett
The Stitch Fix team benefits from a centralized data integration platform at scale using Apache Kafka and Kafka Connect. Liz Bennett (Software Engineer, Confluent) got to play a key role building their real-time data streaming infrastructure. Liz explains how she implemented Apache Kafka® at Stitch Fix, her previous employer, where she successfully introduced Kafka first through a Kafka hackathon and then by pitching it to the management team. Her first piece of advice? Give it a cool name like The Data Highway. As part of the process, she prepared a detailed document proposing a Kafka roadmap, which eventually landed her in a meeting with management on how they would successfully integrate the product (spoiler: it worked!). If you’re curious about the pros and cons of Kafka Connect, the self-service aspect, how it does with scaling, metrics, helping data scientists, and more, this is your episode! You’ll also get to hear what Liz thinks her biggest win with Kafka has
12/02/2020 • 46 minutes 1 second
Distributed Systems Engineering with Apache Kafka ft. Jun Rao
Jun Rao (Co-founder, Confluent) explains what relational databases and distributed databases are, how they work, and major differences between the two. He also delves into important lessons he’s learned along the way through the transition from the relational world to the distributed world. To be successful at a place like Confluent, he outlines three fundamental traits that a distributed systems engineer must possess, emphasizing the importance of curiosity and knowledge, care in code development, and being open-minded and collaborative. You may even find that sometimes, the people with the best answers to your problems aren't even at your company! Originally from China, Jun moved to the U.S. for his Ph.D. and eventually landed in IBM research labs. He worked there for over 10 years before moving to LinkedIn, where Apache Kafka® was initially being developed and implemented. EPISODE LINKSGet 30% off Ka
05/02/2020 • 54 minutes 59 seconds
How to Write a Successful Conference Abstract | Streaming Audio Special
Learn how to write an abstract for conference submissions and call for papers with tips from Tim Berglund, chair of the Kafka Summit Program Committee. Whether you're giving a talk for the very first time or you consider yourself to be an experienced speaker, these guidelines will help you craft a strong story that stands out from the others.EPISODE LINKSJoin #summit-office-hours on the Confluent Community SlackSign up to speak at a meetupWatch the video version of this podcastGet 30% off Kafka Summit London registration with the code KSL20Audio
04/02/2020 • 7 minutes 40 seconds
Streaming Call of Duty at Activision with Apache Kafka ft. Yaroslav Tkachenko
Call of Duty: Modern Warfare is the most played Call of Duty multiplayer of this console generation with over $1 billion in sales and almost 300 million multiplayer matches. Behind the scenes, Yaroslav Tkachenko (Software Engineer and Architect, Activision) gets to be on the team behind it all, architecting, designing, and implementing their next-generation event streaming platform, including a large-scale, near-real-time streaming data pipeline using Kafka Streams and Kafka Connect.Learn about how his team ingests huge amounts of data, what the backend of their massive distributed system looks like, and the automated services involved for collecting data from each pipeline. EPISODE LINKS<a href='https://www.confluent.io/kafka-summit-ny19/call-of-duty-games/?utm_source=buzzsprout&utm_medium=podcast&utm_campaig
27/01/2020 • 46 minutes 43 seconds
Confluent Platform 5.4 | What's New in This Release + Updates
A quick summary of new features, updates, and improvements in Confluent Platform 5.4, including Role-Based Access Control (RBAC), Structured Audit Logs, Multi-Region Clusters, Confluent Control Center enhancements, Schema Validation, and the preview for Tiered Storage.This release also includes pull queries and embedded connectors in preview as part of KSQL.EPISODE LINKSConfluent Platform 5.4 Release Notes Introducing Confluent Platform 5.4<a href='https://confluent.io/download?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.c
22/01/2020 • 14 minutes 26 seconds
Making Apache Kafka Connectors for the Cloud ft. Magesh Nandakumar
From previously focusing on Confluent Schema Registry to now making connectors for Confluent Cloud, Magesh Nandakumar (Software Engineer, Confluent) discusses what connectors do, how they simplify data integrations, and how they enable sophisticated customer use cases. With connectors built for Confluent Cloud on Google Cloud Platform (GCP), Microsoft Azure, and Amazon Web Services (AWS), this helps users implement Apache Kafka® within their existing systems in an easy way. There’s a lot that Magesh is looking forward to when the world of connectors and the world of cloud collide.EPISODE LINKSWhy Kafka Connect? ft. Robin MoffattJoin the Confluent Community SlackFully managed Apache Kafka as a service! Try free.</li
13/01/2020 • 25 minutes 19 seconds
Location Data and Geofencing with Apache Kafka ft. Guido Schmutz
One way to put Apache Kafka into action is through geofencing and tracking the location data of objects, barges, and cars in real time. Guido Schmutz (Principal Consultant, Trivadis) shares about one such use case involving a German steel company and the development project he worked on for them, which he featured in a talk at Berlin Buzzwords. EPISODE LINKSLocation Analytics – Real-Time Geofencing Using Kafka (Video) Location Analytics – Real-Time Geofencing Using Kafka (Slides) Join the Confluent Community SlackGet 30% off Kafka Summit London registration with the code KSL20Audio
08/01/2020 • 48 minutes 20 seconds
Multi-Cloud Monitoring and Observability with the Metrics API ft. Dustin Cote
The role of monitoring hosted services is evolving, but the ability to let go of the details to get what you are paying for with SaaS has always been there. Dustin Cote (Product Manager for Observability, Confluent Cloud) talks about Apache Kafka® made serverless and how beyond just the brokers, Confluent Cloud focuses on fitting into customer systems rather than building monitoring silos. When it comes to monitoring, logging, tracing, and alerting, Dustin defines what they all mean and how they operate in a database before diving into the requirements needed in order for a properly observable cloud service to exist and on-prem service. EPISODE LINKSConfluent Cloud Metrics API documentation<a href='https://cnfl.io/confluent-c
30/12/2019 • 42 minutes 19 seconds
Apache Kafka and Apache Druid – The Perfect Pair ft. Rachel Pedreschi
As the head of global field engineering and community at Imply, Rachel Pedreschi is passionate about engaging both externally with customers and internally with departments all across the board, from sales to engineering. Rachel’s involvement in the open source community focuses primarily on Apache Druid, a real-time, high-performance datastore that provides fast, sub-second analytics and complements another powerful open source project as well: Apache Kafka®. Together, Kafka and Druid provide real-time event streaming and high-performance streaming analytics with powerful visualizations.EPISODE LINKSHow To Use Kafka and Druid to Tame Your Router Data<a href='https://podcasts.apple.com/us/podcast/etl-and-even
23/12/2019 • 50 minutes 12 seconds
Apache Kafka 2.4 – Overview of Latest Features, Updates, and KIPs
Apache Kafka 2.4 includes new Kafka Core developments and improvements to Kafka Streams and Kafka Connect, including MirrorMaker 2.0, RocksDB metrics, and more.EPISODE LINKSRead about what's new in Apache Kafka 2.4Check out the Apache Kafka 2.4 release notesWatch the video version of this podcast
16/12/2019 • 15 minutes 4 seconds
Cloud-Native Patterns with Cornelia Davis
Developing cloud-based applications requires unique patterns and practices that make them suitable for modern cloud platforms. Host Tim Berglund catches up with Cornelia Davis, author of Cloud-Native Patterns and VP of Technology at Pivotal, on what cloud-native patterns are, the example code she created, her latest book, and how she wrote the book for the customers she interacts with on a daily basis. EPISODE LINKSGet 40% off Cloud Native Patterns with the code podcon19Join the Confluent Community SlackFully managed Apache Kafka as a service! Try free.
16/12/2019 • 53 minutes 12 seconds
Ask Confluent #16: ksqlDB Edition
Vinoth Chandar has led various infrastructure projects at Uber and is one of the main drivers behind the ksqlDB project. In this episode hosted by Gwen Shapira (Engineering Manager, Cloud-Native Apache Kafka®), Vinoth and Gwen discuss what ksqlDB is, the kinds of applications that you can build with it, vulnerabilities, and various ksqlDB use cases. They also talk about what's currently the best version of Apache Kafka version for performance improvements that don’t cause breaking changes to existing Kafka configuration and functionality. EPISODE LINKSRead about ksqlDB on the blogLearn more about ksqlDBksqlDB Demo | The Event Streaming Database in ActionFollow ksqlDB on Twitter<a href='https://www.youtube.com/
12/12/2019 • 30 minutes 11 seconds
Machine Learning with Kafka Streams, Kafka Connect, and ksqlDB ft. Kai Waehner
In this episode, Kai Waehner (Senior Systems Engineer, Confluent) defines machine learning in depth, describes the architecture of his dream machine learning pipeline, shares about its relevance to Apache Kafka®, Kafka Connect, ksqlDB, and the related ecosystem, and discusses the importance of security and fraud detection. He also covers Kafka use cases, including an example of how Kafka Streams and TensorFlow provide predictive analytics for connected cars.EPISODE LINKSHow to Build and Deploy Scalable Machine Learning in Production with Apache KafkaLearn about Apache Kafka<a href='https://docs.confluent.io/current/connect/index.html?utm_source=buzzsprout&
04/12/2019 • 38 minutes 30 seconds
Real-Time Payments with Clojure and Apache Kafka ft. Bobby Calderwood
Streamlining banking technology to help smaller banks and credit unions thrive among financial giants is top of mind for Bobby Calderwood (Founder, Evident Systems), who started out in programming, transitioned to banking, and recently launched Evident Real-Time Payments. Payments leverages Confluent Cloud to help banks of all sizes transform to real-time banking services from traditionally batch-oriented, bankers’ hours operational mode. This is achieved through Apache Kafka® and the Kafka Streams and Kafka Connect APIs with Clojure using functional programming paradigms like transducers. Bobby also shares about his efforts to help financial services companies build their next-generation platforms on top of streaming events, including interesting use cases, addressing hard problems that come up in payments, and identifying solutions that make event streaming technology easy to use within established banking structures. EPISODE LINKS<a href='https://
27/11/2019 • 58 minutes
Announcing ksqlDB ft. Jay Kreps
Jay Kreps (Co-creator of Apache Kafka® and CEO, Confluent) introduces ksqlDB, an event streaming database. As the successor to KSQL, ksqlDB seeks to unify the multiple systems involved in stream processing into a single, easy-to-use solution for building event streaming applications.ksqlDB offers support for running connectors in an embedded mode, in addition to support for both push and pull queries. Push queries allow you to subscribe to changing query results as new events occur, while pull queries allow you to look up a particular value at a single point in time. To use a ride-sharing app as an example, there is both a continuous feed of the current position of the driver (a push query) and the ability to look up current values such as the price of the ride (a pull query). Databases are still effective in their own realms, and ksqlDB is not intended as a replacement. Rather, ksqlDB enables you to build event streaming applications with the same ease and familiarity
20/11/2019 • 26 minutes 57 seconds
Installing Apache Kafka with Ansible ft. Viktor Gamov and Justin Manchester
“It’s one thing to get a distributed system up and running. It’s another thing to get a distributed system up and running well.” Ansible keeps your Apache Kafka® deployment, management, and installation consistent, and it enables you to implement best practices that make it easy to get started. Justin Manchester (Platform DevOps Engineer, Confluent) and Viktor Gamov (Developer Advocate, Confluent) discuss the problems that Ansible is trying to solve, enabling collaboration and optimizing all components for top performance.EPISODE LINKSLearn more about AnsibleFollow Viktor Gamov on TwitterFollow Justin Manchester on Twitter<a href='https://www.confluent.io/blog/confluent-platform-installation-with-cp-ansible?utm_source=buzzsprout&utm_medium=podcast&utm_campaign=ch.installing-apache-kafka-with-ansible_type.commu
18/11/2019 • 46 minutes 6 seconds
Securing the Cloud with VPC Peering ft. Daniel LaMotte
Everything is moving to the cloud, which makes it increasingly important to secure your cloud infrastructure and minimize the threat of potential attackers. With a virtual private cloud (VPC)—your own private network in the cloud that you can launch your own instances into—this can be done with VPC Peering, connecting VPCs together to create a path between them to keep your data safe and accessible to you alone. Although typically performed in a single cloud provider, it is possible to do in more than one—think of it as your cloud routerDaniel LaMotte (Site Reliability Engineer, Confluent) walks through the details of cloud networking and VPC peering: what it is, what it does, and how to launch a VPC in the cloud, plus the difference between AWS PrivateLink and AWS Transit Gateway, CIDR, and its accessibility across cloud providers. EPISODE LINKSVPC Peering in Confluent Cloud<a href='h
13/11/2019 • 31 minutes 56 seconds
ETL and Event Streaming Explained ft. Stewart Bryson
Migrating from traditional ETL tools to an event streaming platform is a process that Stewart Bryson (CEO and founder, Red Pill Analytics), is no stranger to. In this episode, he dispels misconceptions around what “streaming ETL” means, and explains why event streaming and event-driven architectures compel us to rethink old approaches:Not all data is corporate data anymoreNot all data is relational data anymoreThe cost of storing data is now negligibleSupporting modern, distributed event streaming platforms, and the shift of focus from on-premises to the cloud introduces new use cases that focus primarily on building new systems and rebuilding existing ones. From Kafka Connect and stack applications to the importance of tables, events, and logs, Stewart also discusses Gradle and how it’s being used at Red Pill Analytics. EPISODE LINKS<a href='https://www.confluent.io/blog/deploying-kafka-streams-and-ksql-with-gradl
06/11/2019 • 49 minutes 42 seconds
The Pro’s Guide to Fully Managed Apache Kafka Services ft. Ricardo Ferreira
Several definitions of a fully managed Apache Kafka® service have floated around, but Ricardo Ferreira (Developer Advocate, Confluent) breaks down what it truly means and why every developer should care. Addressing a handful of questions around Apache Kafka®, Confluent Cloud, hosted solutions, and how they all work, Ricardo describes the benefits of using a fully managed service as a means of simplifying the lives of developers and letting them get back to building—which is why they started out as developers in the first place! EPISODE LINKSThe Rise of Managed Services for Apache KafkaExcerpt from The Beginner's Guide to Mathematica, Version 4Jay Kreps’ keynote at Kafka Summit SF 2019Neha Narkhede
04/11/2019 • 56 minutes 28 seconds
Kafka Screams: The Scariest JIRAs and How To Survive Them ft. Anna McDonald
In today's spooktacular episode of Streaming Audio, Anna McDonald (Technical Account Manager, Confluent) discusses six of the scariest Apache Kafka® JIRAs. Starting with KAFKA-6431: Lock Contention in Purgatory, Anna breaks down what purgatory is and how it’s not something to fear or avoid. Next, she dives into KAFKA-8522: Tombstones Can Survive Forever, where she explains tombstones, compacted topics, null values, and log compaction. Not to mention there’s KAFKA-6880: Zombie Replicas Must Be Fenced, which sounds like the spookiest of them all. KAFKA-8233, which focuses on the new TestTopology mummy (wrapper) class, provides one option for setting the topology through your Kafka Screams Streams application. As Anna puts it, "This opens doors for people to build better, more resilient, and more interesting topologies." To close out the episode, Anna talks about two more JIRAs: KAFKA-6738, which focuses on the Kafka Connect dead letter queue as a
30/10/2019 • 46 minutes 32 seconds
Data Integration with Apache Kafka and Attunity
From change data capture (CDC) to business development, connecting Apache Kafka® environments, and customer success stories, Graham Hainbach discusses the possibilities of data integration with Kafka and Attunity using Replicate, Compose, and Enterprise Manager. He also shares real-life examples of how Attunity best leverages Kafka in their systems.EPISODE LINKSApache Kafka Transaction Data Streaming for DummiesJoin the Confluent Community SlackFully managed Apache Kafka as a service! Try free.
28/10/2019 • 43 minutes 49 seconds
Distributed Systems Engineering with Apache Kafka ft. Colin McCabe
Colin McCabe shares about what it’s like being a distributed systems engineer on the Core Kafka team at Confluent, where he has worked previously, and how that led to his interest in Apache Kafka®. As an active member of the Apache open source community, he describes that the community is a place that both welcomes newcomers and fosters different ideas that help make the product the best that it can be for everyone.Being a distributed systems engineer versus a full stack engineer comes with its own unique challenges. Colin offers some advice for those interested in working with Kafka and what the interview process is like at Confluent. It’s not all about what you know, but rather how you collaborate and contribute to the team, and how you get to the answer. Part of finding the answer is getting involved with Apache projects themselves by engaging with others and helping with bug fixes as much as possible, because it’ll help you gain a better grasp on a technology that is ever
23/10/2019 • 45 minutes 41 seconds
Apache Kafka on Kubernetes, Microsoft Azure, and ZooKeeper with Lena Hall
Lena Hall joins Tim Berglund in the studio to talk about Apache Kafka®, the various ways to run Kafka on Microsoft Azure, Kafka on Kubernetes (K8s), and some exciting events that are happening in the Kafka world. Lena shares about serving double duty as both a senior software engineer and senior cloud developer advocate for Azure Engineering, including her unique roles and responsibilities, and how she balances engineering with advocacy. From writing tech articles to her experience with fuzzing and presence on YouTube, Lena is a strong community supporter and believes in the importance of staying rooted in the world of code as an advocate, because it helps you better understand common challenges and gives you insight as an engineer trying to fix them. It’s important to ask what's good about it and how it can be improved.They also discuss Kubernetes, the benefits of running Kafka on Kubernetes, why it’s popular, and using systems that can integrate with it. With
16/10/2019 • 46 minutes 8 seconds
Improving Fairness Through Connection Throttling in the Cloud with KIP-402 ft. Gwen Shapira
The focus of KIP-402 is to improve fairness in how Apache Kafka® processes connections and how network threads pick up requests and new data. Gwen Shapira (Engineering Manager for Cloud-Native Kafka, Confluent) outlines the details of this KIP and her team’s efforts to make user-facing Kafka improvements. Halfway through the episode, Gwen shares how to send metadata and produce client messages. EPISODE LINKSKIP-402: Improve fairness in SocksetServer processorsJoin the Confluent Community SlackFully managed Apache Kafka as a service! Try free.
09/10/2019 • 48 minutes 37 seconds
Data Modeling for Apache Kafka – Streams, Topics & More with Dani Traphagen
Helping users be successful when it comes to using Apache Kafka® is a large part of Dani Traphagen’s role as a senior systems engineer at Confluent. Whether she’s advising companies on implementing parts of Kafka or rebuilding their systems entirely from the ground up, Dani is passionate about event-driven architecture and the way streaming data provides real-time insights on business activity. She explains the concept of a stream, topic, key, and stream-table duality, and how each of these pieces relate to one another. When it comes to data modeling, Dani covers importance business requirements, including the need for a domain model, practicing domain-driven design principles, and bounded context. She also discusses the attributes of data modeling: time, source, key, header, metadata, and payload, in addition to exploring the significance of data governance and lineage and performing joins.EPISODE LINKS<a href='https://www.michael-noll.com/blog/2018/04/05/
07/10/2019 • 40 minutes 25 seconds
MySQL, Cassandra, BigQuery, and Streaming Analytics with Joy Gao
Joy Gao chats with Tim Berglund about all things related to streaming ETL—how it works, its benefits, and the implementation and operational challenges involved. She describes the streaming ETL architecture at WePay from MySQL/Cassandra to BigQuery using Apache Kafka®, Kafka Connect, and Debezium. EPISODE LINKSCassandra Source Connector DocumentationStreaming Databases in Real Time with MySQL, Debezium, and KafkaStreaming Cassandra at WePayChange Data Capture with Debezium ft. Gunnar Morling<a href='https://cnfl.io/confluent-comm
02/10/2019 • 43 minutes 59 seconds
Scaling Apache Kafka with Todd Palino
Todd Palino, a senior SRE at LinkedIn, talks about the start of Apache Kafka® at LinkedIn, what learning to use Kafka was like, how Kafka has changed, and what he and others in the community hope for in the future of Kafka. If you’re curious about life as an SRE, Todd shares the details on that too, and goes into how Kafka is used at LinkedIn, as well as several wins and challenges over the years with the product. EPISODE LINKSKafka: The Definitive Guide by Neha Narkhede, Gwen Shapira & Todd PalinoURP? Excuse You! The Three Metrics You Have to Know Join the Confluent Community Slack
25/09/2019 • 46 minutes 3 seconds
Understand What’s Flying Above You with Kafka Streams ft. Neil Buesing
Neil Buesing (Director of Real-Time Data, Object Partners) discusses what a day in his life looks like and how Kafka Streams helps analyze flight data.EPISODE LINKSUsing Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL and KSQLKafka: The Definitive Guide by Neha Narkhede, Gwen Shapira & Todd PalinoRead the Confluent blogJoin the Confluent Community Slack
23/09/2019 • 13 minutes
KIP-500: Apache Kafka Without ZooKeeper ft. Colin McCabe and Jason Gustafson
Tim Berglund sits down with Colin McCabe and Jason Gustafson to talk about KIP-500. The pair, who work on the Kafka Core Engineering Team, discuss the history of Kafka, the creation of KIP-500, and what it will do for the community as a whole. They break down ZooKeeper's role in Kafka, the implications of removing ZooKeeper dependency, replacing it with a self-managed metadata quorum, and how they've been combatting security, stability, and compatibility issues. With pending improvements towards scalability and inter-broker communication, and now that KIP-500 has been adopted within the community—there's a lot covered in this episode that you won't want to miss!EPISODE LINKSKIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum<a href='https://cwiki.apache.org/confluence/display/KAFKA/KIP-497%3A+Ad
18/09/2019 • 43 minutes 46 seconds
Should You Run Apache Kafka on Kubernetes? ft. Balthazar Rouberol
When it comes to deploying applications at scale without needing to integrate different pieces of infrastructure yourself, the answer nowadays is increasingly Kubernetes. Kubernetes provides all the building blocks that are needed, and a lot of thought is required to truly create an enterprise-grade Apache Kafka® platform that can be used in production. But before running Kafka on Kubernetes, there are some factors to consider. What are the maturing stages of Kubernetes adoption? How did Datadog experience these stages? Balthazar Rouberol shares what to think about before hopping on Kubernetes hype train.EPISODE LINKSKafka-Kit: Tools for Scaling KafkaRunning Production Kafka Clusters in KubernetesJoin the Confluent Community Slack
16/09/2019 • 29 minutes 38 seconds
Jay Kreps on the Last 10 Years of Apache Kafka and Event Streaming
As Confluent turns five years old, special guest Jay Kreps (Co-founder and CEO, Confluent) brings us back to his early development days of coding Apache Kafka® over a Christmas holiday while working at LinkedIn. Kafka has become a breakthrough open source distributed streaming platform based on an abstraction of the distributed commit log, and his involvement in the project eventually led him to start Confluent with Jun Rao and Neha Narkhede. In this episode, Jay shares about all the highs and lows along the way, including some of his favorite customer success stories with companies like Lyft and Euronext, which empower their real-time businesses through event streaming with Confluent Cloud.Starting a company certainly comes with more than the technology, and Jay also reflects on some of the challenges around funding, support, and introducing Confluent to the rest of the world. How they have brought us from the beginning to now yields some wise words from Jay to
12/09/2019 • 48 minutes 25 seconds
Connecting to Apache Kafka with Neo4j
What’s a graph? How does Cypher work? In today's episode of Streaming Audio, Tim Berglund sits down with Michael Hunger (Lead of Neo4j Labs) and David Allen (Partner Solution Architect, Neo4j) to discuss Neo4j basics and get the scoop on major features introduced in Neo4j 3.4 and 3.5. Among these are geospatial and temporal types, but there’s also more to come in 4.0: a multi-database feature, fine-grained security, and reactive drivers/Spring Data Neo4j RX. In addition to sharing a little bit about the history of the integration and features in relation to Apache Kafka®, they also discuss change data capture (CDC), using Neo4j to put graph operations into an event streaming application, and how GraphQL fits in with event streaming and GRANDstack. The goal is to add graph abilities to help any distributed application become more successful.EPISODE LINKSKafka Connect Neo4j Sink
09/09/2019 • 54 minutes 29 seconds
Ask Confluent #15: Attack of the Zombie Controller
Gwen Shapira (Core Kafka Software Engineer, Confluent) sits down to answer the questions you've had about event streaming, Apache Kafka®, Confluent, and everything in between. This includes creating tables in nested JSON topics, how to balance ordering, latency and reliability, building event-based systems, and how to navigate the tricky endOffsets API. She talks about the hardships of fencing Zombie requests, some of the talks given at previous Kafka Summits, and an important question from Ask Confluent #3. EPISODE LINKSKIP-91: Provide Intuitive User Timeouts in The ProducerKIP-79: ListOffsetRequest/ListOffsetResponse v1 and add timestamp search methods to the new consumer<a href=
04/09/2019 • 22 minutes 27 seconds
Helping Healthcare with Apache Kafka and KSQL ft. Ramesh Sringeri
In today’s episode of Streaming Audio, Tim Berglund sits down with Senior Applications Developer of Mobile Solutions Ramesh Sringeri to discuss Apache Kafka®—specifically two Kafka use cases that Children’s Healthcare of Atlanta is working on.First, they discuss achieving near-real-time streams of data to support meaningful intracranial pressure prediction and managing intracranial pressure (ICP) in a timely manner to help the care team achieve better outcomes with traumatic brain injuries.Children’s Healthcare of Atlanta is in the process of building machine learning models for predicting ICP values 30 and 60 minutes in the future. This will help the care team better prepare for handling potential adverse conditions, where elevated ICP values could lead to undesirable outcomes. The Children’s team is using Kafka, KSQL, and Kafka Streams programs to build a pipeline in which they can test their machine learning models.Ramesh also shares about the work t
28/08/2019 • 52 minutes 47 seconds
Contributing to Open Source with the Kafka Connect MongoDB Sink ft. Hans-Peter Grahsl
Sink and source connectors are important for getting data in and out of Apache Kafka®. Tim Berglund invites Hans-Peter Grahsl (Technical Trainer and Software Engineer, Netconomy Software & Consulting GmbH) to share about his involvement in the Apache Kafka project, spanning from several conference contributions all the way to his open source community sink connector for MongoDB, now part of the official MongoDB Kafka connector code base. Join us in this episode to learn what it’s like to be the only maintainer of a side project that’s been deployed into production by several companies!EPISODE LINKSMongoDB Connector for Apache KafkaGetting Started with the MongoDB Connector for Apache Kafka and MongoDB<a href='https://www.confluent.io/hub/hpgrahsl/kafka-connec
21/08/2019 • 50 minutes 22 seconds
Teaching Apache Kafka Online with Stéphane Maarek
Streaming Audio welcomes Stéphane Maarek (CEO, Datacumulus) on the podcast to discuss how he got started hosting online Apache Kafka® tutorials and teaching on Udemy, the challenges he faces as an instructor, his approach to answering hard questions, and the projects he is currently working on.EPISODE LINKSKSQL Training for Hands-On LearningJoin the Confluent Community Slack
19/08/2019 • 42 minutes 22 seconds
Connecting Apache Cassandra to Apache Kafka with Jeff Carpenter from DataStax
Whenever you see an Apache Cassandra™ in the wild, you probably also see an Apache Kafka®️. In this episode, Tim Berglund (Senior Director of Developer Experience, Confluent) and Jeff Carpenter (Director of Developer Advocacy, DataStax) discuss the best way to get those systems talking using the DataStax Apache Kafka Connector and build a real-time data pipeline. EPISODE LINKSAbout the DataStax Apache Kafka ConnectorDataStax Academy: DataStax Apache Kafka Connector CourseJoin the Confluent Community Slack
12/08/2019 • 47 minutes 58 seconds
Transparent GDPR Encryption with David Jacot
The General Data Protection Regulation (GDPR) has challenged many enterprises to rethink how they deal with customer data. Viktor Gamov chats with David Jacot about a unique approach to inter-broker traffic encryption that he implemented for his customer’s sidecar pattern use case.EPISODE LINKSLearn about IstioLearn about EnvoyLearn about LinkerdHandling GDPR with Apache Kafka®: How to Comply Without Freaking Out? Join the Confluent Community Slack
08/08/2019 • 16 minutes 45 seconds
Confluent Platform 5.3 | What's New in This Release
A quick summary of the most important features in Confluent Platform 5.3. We discuss improved Kubernetes and Ansible support, improvements to Confluent Control Center that give you better insight into the data in your cluster, and an important new set of security features—Role-Based Access Control—aimed at making complex deployments more secure.EPISODE LINKSRead the docsRead the blogWatch the video version of this podcast (featuring an actual stream)Download Confluent Platform 5.3Join us in Confluent Community Slack
31/07/2019 • 13 minutes 2 seconds
How to Convert Python Batch Jobs into Kafka Streams Applications with Rishi Dhanaraj
Zenreach is a company that makes tools to help retailers use digital marketing more effectively. If that sounds like a problem that only marketing people would be interested in, that’s because you don’t know what they do! There are all kinds of fascinating technology problems to solve by utilizing event streaming platforms to process data at volume. Rishi Dhanaraj, our guest today, worked at Zenreach as an intern, and took on a big pile of Python batch jobs, turning them into some really interesting Kafka Streams code. Listen in as he walks us through how he did it.EPISODE LINKSA Beginner's Perspective on Kafka Streams: Building Real-Time Walkthrough DetectionReal-Time Presence Detection at Scale with Apache Kafka on AWS<a href='ht
29/07/2019 • 31 minutes 2 seconds
Ask Confluent #14: In Control of Kafka with Dan Norwood
Is Apache Kafka® actually a database? Can you install Confluent Control Center on Google Cloud Platform (GCP)? All this, plus some tips from Dan Norwood, the first user of Kafka Streams.EPISODE LINKSControl Center Docker imageControl Center Docker configurationComplete Streams exampleWatch the video version of this podcastJoin us in Confluent Community Slack
22/07/2019 • 23 minutes 50 seconds
Kafka in Action with Dylan Scott
Author Dylan Scott tells all about his upcoming Manning title Kafka in Action, which shares how Apache Kafka® can be used by beginners who are just starting out their own projects and dispels common Hadoop-related myths, as Kafka has grown to become a powerful event streaming platform beyond big data ecosystems alone. To get 40% off Manning products, use the following code: podcon19EPISODE LINKSJoin us in Confluent Community Slack
15/07/2019 • 38 minutes 15 seconds
Change Data Capture with Debezium ft. Gunnar Morling
Friends don’t let friends do dual writes! Gunnar Morling (Software Engineer, Red Hat) joins us on the podcast to share a little bit about what Debezium is, how it works, and which databases it supports. In addition to covering the various use cases and benefits from change data capture (CDC) in the context of microservices—touching on the outbox pattern in particular, Gunnar walks us through the advantages of log-based CDC as implemented through Debezium over polling-based approaches, why you’d want to avoid dual writes to multiple resources, and engaging with members from the community to work collaboratively on Debezium.EPISODE LINKSJoin us in Confluent Community Slack
10/07/2019 • 49 minutes 15 seconds
Distributed Systems Engineering with Apache Kafka ft. Jason Gustafson
Ever wonder what it’s like to be a distributed systems engineer at Confluent? Core Kafka Engineer Jason Gustafson dives into the challenges of working on distributed systems, particularly when it comes to a unique system like Apache Kafka®. He also discusses ways in which Confluent is working with the community to solve active problems and what it takes to be a distributed systems engineer.As always, Confluent is looking for engineers who are interested in distributed systems, and you don’t have to have 10 years of experience to do it!EPISODE LINKSKIP-392: Allow consumers to fetch from closest replicaKafka Improvement ProposalsHow to contribute<a href='https://medi
02/07/2019 • 45 minutes 56 seconds
Apache Kafka 2.3 | What's New in This Release + Updates and KIPs
Tim Berglund (Senior Director of Developer Experience, Confluent) explains what’s new in Apache Kafka® 2.3 and highlights some of the most important Kafka Improvement Proposals (KIPs).EPISODE LINKSRead the blogWatch the video version of this podcast
25/06/2019 • 13 minutes 42 seconds
Rolling Kafka Upgrades and Confluent Cloud ft. Gwen Shapira
If you operate a Kafka cluster, hopefully you upgrade your brokers occasionally. Each release of Apache Kafka® includes detailed documentation that describes a tested procedure for doing a rolling upgrade of your cluster. Couldn’t be easier, right? Well, what if you have to do it with hundreds or thousands of brokers, such as you’d have to do if you were running Confluent Cloud? Today, Gwen Shapira shares some of the lessons she’s learned doing just that.EPISODE LINKSFully managed Apache Kafka as a service! Try free.
25/06/2019 • 42 minutes 43 seconds
Deploying Confluent Platform, from Zero to Hero ft. Mitch Henderson
Mitch Henderson (Technical Account Manager, Confluent) explains how to plan and deploy your first application running on Confluent Platform. He covers critical factors to consider, like the tools and skills you should have on hand, and how to make decisions about deployment solutions. Mitch also walks you through how to go about setting up monitoring and testing, the marks of success, and what to do after your first project launches successfully.
18/06/2019 • 32 minutes 30 seconds
Why Kafka Connect? ft. Robin Moffatt
In this episode, Tim talks to Robin Moffatt about what Kafka Connect is and why you should almost certainly use it if you're working with Apache Kafka®️. Whether you're building database offload pipelines to Amazon S3, ingesting events from external datastores to drive your applications or exposing messages from your microservices for audit and analysis, Kafka Connect is for you. Tim and Robin cover the motivating factors for Kafka Connect, why people end up reinventing the wheel when they're not aware of it and Kafka Connect's capabilities, including scalability and resilience. They also talk about the importance of schemas in Kafka pipelines and programs, and how the Confluent Schema Registry can help.EPISODE LINKSKafka Connect 101 course<a href='https://developer.confluent.io/podcast/intro-to-kafka-connect-core-components-and-architecture-ft-r
12/06/2019 • 46 minutes 42 seconds
Schema Registry Made Simple by Confluent Cloud ft. Magesh Nandakumar
Tim Berglund and Magesh Nandakumar (Software Engineer, Confluent) discuss why schemas matter for building systems on Apache Kafka®, and how Confluent Schema Registry helps with the problem. They talk about how Schema Registry works, how you can collaborate around schema change through `avsc` files, and what it means for this to be available in Confluent Cloud today.EPISODE LINKSSchema Registry 101Schema ManagementMigrate Schemas to Confluent CloudSchemas, Contracts, and CompatibilityFully managed Apache Kafka as a service! Try free.<
03/06/2019 • 41 minutes 36 seconds
Why is Stream Processing Hard? ft. Michael Drogalis
Tim Berglund and Michael Drogalis (Product Lead for Kafka Streams and KSQL, Confluent) talk about all things stream processing: why it’s complex, how it's evolved, and what’s on the horizon to make it simpler.
29/05/2019 • 45 minutes 45 seconds
Testing Kafka Streams Applications with Viktor Gamov
Tim Berglund is joined by Viktor Gamov (Developer Advocate, Confluent) to discuss various approaches to testing Kafka Streams applications.EPISODE LINKSKafkaEmbeddedTopologyTestDriverMocked Streams (Scala)MockafkaTest containersKafka containers
20/05/2019 • 42 minutes 33 seconds
Chris Riccomini on the History of Apache Kafka and Stream Processing
It’s a problem endemic to the tech world that we are always focused on what’s coming next, that we often forget to look at where we’ve been. Chris Riccomini, who was there at LinkedIn when Apache Kafka® was born, tells us how Kafka and the stream processing framework Samza came about, and also what he’s doing these days at WePay—building systems that use Kafka as a primary datastore.EPISODE LINKSWhen It Absolutely, Positively, Has to be There: Reliability Guarantees in KafkaSo, You Want to Build a Kafka Connector? Source Edition.Kafka is Your Escape Hatch
16/05/2019 • 50 minutes 59 seconds
Ask Confluent #13: Machine Learning with Kai Waehner
Gwen and Kai chat about machine learning architectures, and whether software engineers and data scientists can learn to get along.EPISODE LINKSBlogs on deploying machine learning workloads: Machine Learning with Python, Jupyter, KSQL and TensorFlowHow to Build and Deploy Scalable Machine Learning in Production with Apache KafkaUsing Apache Kafka to Drive Cutting-Edge Machine LearningKIP-392: Allow consumers to fetch from closest replica Watch the video
08/05/2019 • 33 minutes 15 seconds
Diving into Exactly Once Semantics with Guozhang Wang
It has been said that in distributed messaging, there are two hard problems: 2) exactly once delivery, 1) guaranteed order of messages and 2) exactly once delivery. Apache Kafka® has offered exactly once processing since version 0.11, which allows properly configured producers and consumers to make the guarantee that each message will be processed exactly one time. In this episode, Kafka Streams engineer Guozhang Wang walks through the implementation of transactional messaging in Kafka in some detail, including the idempotent producer API, the transaction coordinator responsible for managing the transaction log and consumer configurations. It’s a complex topic, but he takes us through it carefully and completely.EPISODE LINKSTransactions in Apache Kafka Enabling Exactly Once in Kafka Streams <
22/04/2019 • 47 minutes 53 seconds
Ask Confluent #12: In Search of the Lost Offsets
Stanislav Kozlovski joins us to discuss common pitfalls when using Kafka consumers and a new KIP that promises to make consumer restarts much smoother.EPISODE LINKSKIP-345: Static consumer membership KIP-211: Documents the current behavior of offset expirationWatch the video version of this podcast
17/04/2019 • 22 minutes 4 seconds
Ben Stopford on Microservices and Event Streaming
Microservices are pretty ubiquitous these days. Really “SOA done right,” they reimagine the services pattern in the context of the world we live in today, nearly two decades since the first big service-oriented systems hit production. But what have we learned in this time? There are plenty of war stories. System designers have explored different architectural patterns—REST, events and databases of all types. In this podcast, Tim Berglund and Ben Stopford explore the event-driven paradigm and how it relates to the microservice architectures we build today. Ben dives deep into coupling, evolution and challenges of our increasingly data-oriented culture. He also talks about the future, where data are events and events are data, and touches on real-time architectures that retain the decoupling properties needed to be pluggable, and to evolve. Powerful stuff.EPISODE LINKSDesigning Event-Driv
08/04/2019 • 58 minutes 15 seconds
Magnus Edenhill on librdkafka 1.0
After several years of development, librdkafka has finally reached 1.0! It remains API compatible with older versions of the library, so you won’t need to make any changes to your application. There are, however, several important new features like the idempotent producer, sparse broker connections, support for the vaunted KIP-62 and a complete makeover for the C#/.NET client.EPISODE LINKSlibrdkafka v1.0.0 release notes
03/04/2019 • 46 minutes 47 seconds
Ask Confluent #11: More Services, More Metrics, More Fun
Do metrics for detecting clients from old versions actually exist? Or is Gwen making features up? This and more useful advice is coming up on today's episode of Ask Confluent.EPISODE LINKSThe Java property that will refresh DNS cache frequently: java.security.Security.setProperty(“networkaddress.cache.ttl” , “60");Improvements to DNS lookups in Confluent Platform 5.1.2 (Apache Kafka 2.1.1):KAFKA-7755KAFKA-7890More reasons to upgrade to Confluent Platform 5.1.2Monitoring clients with old versions:KIP-188 has lots of important new metrics If you are worried about “down-conv
26/03/2019 • 14 minutes 28 seconds
It’s Time for Streaming to Have a Maturity Model ft. Nick Dearden
Nick Dearden explains the five stages of streaming maturity. They are not denial, anger, bargaining, depression and acceptance—that’s the Kübler-Ross model, and it’s for bad things. This one is for awesome things, and takes you from the first streaming project you ever build all the way to a state where an entire organization is transformed to think in terms of real-time, event-driven systems. If you have ever found yourself trying to get streaming technology adopted, this episode is for you!EPISODE LINKSFive Stages to Streaming Platform Adoption
18/03/2019 • 36 minutes 56 seconds
Containerized Apache Kafka On Kubernetes with Viktor Gamov
Kubernetes provides all the building blocks needed to run stateful workloads, but creating a truly enterprise-grade Apache Kafka® platform that can be used in production is not always intuitive. In this episode, Tim Berglund and Viktor Gamov address some of the challenges and pitfalls of managing Kafka on Kubernetes at scale. They also share lessons learned from the development of the Confluent Operator for Kubernetes, and answer questions like:-What is Kubernetes?-What are stateful workloads?-Why are they hard?-Will Confluent Operator make it easier?EPISODE LINKSJoin the #kubernetes Slack channelKafka on Kubernetes: Does it really have to be “The Hard Way”?
11/03/2019 • 41 minutes 45 seconds
Catch Your Bus with KSQL: A Stream Processing Recipe by Leslie Kurt
We all know that feeling of waiting when your ride is running late. Leslie Kurt shares about how you can use KSQL to calculate the difference between the expected arrival time and real-time updates of a bus as it executes its route. Listen as Leslie walks you through fundamental concepts like KTables, Kafka Streams, persistent queries and Confluent MQTT Proxy, as well as other use cases that involve a similar mechanism of capturing Unix timestamps and performing a stream processing operation on these timestamps.EPISODE LINKSAbout KSQLStream Processing CookbookKSQL Recipe: Calculating Bus Delay TimeFor more, you can check out ksqlDB, the successor to KSQL.
04/03/2019 • 19 minutes 27 seconds
KTable Update Suppression (and a Bunch About KTables) ft. John Roesler
When you are dealing with streaming data, it might seem like tables are things that dwell in the far-off land of relational databases, outside of Apache Kafka and your event streaming system. But then the Kafka Streams API gives us the KTable abstraction, which lets us create tabular views of data in Kafka topics. Apache Kafka 2.1 featured an interesting change to the table API—commonly known to the world as KIP-328—that gives you better control over how updates to tables are emitted into destination topics. What might seem like a tiny piece of minutia gives us an opportunity to explore important parts of the Streams API, and unlocks some key new use cases. Join John Roesler for a clear explanation of the whole thing.
27/02/2019 • 45 minutes 56 seconds
Splitting and Routing Events with KSQL ft. Pascal Vantrepote
Tim Berglund chats with System Engineer Pascal Vantrepote about a KSQL recipe he created based on a real-life customer use case in the financial services industry. They also discuss the advantages of KSQL, such as its expressiveness and ease of deployment in places where you’re not already writing a Java application.EPISODE LINKSAbout KSQL Stream Processing CookbookKSQL Recipe: Data Routing Joined with a KTableFor more, you can check out ksqlDB, the successor to KSQL.
25/02/2019 • 20 minutes 42 seconds
Ask Confluent #10: Cooperative Rebalances for Kafka Connect ft. Konstantine Karantasis
Want to know how Kafka Connect distributes tasks to workers? Always thought Connect rebalances could be improved? In this episode of Ask Confluent, Gwen Shapira speaks with Konstantine Karantasis, software engineer at Confluent, about the latest improvements to Kafka Connect and how to run the Confluent CLI on Windows.EPISODE LINKSImproved rebalancing for Kafka ConnectImproved rebalancing for Kafka StreamsThe "what would Kafka do?" scenario from Mark PapadakisThe future of retail at
20/02/2019 • 21 minutes 29 seconds
The Future of Serverless and Streaming with Neil Avery
Neil Avery explores the intersection between FaaS and event streaming applications before taking a quick detour back in time to understand how we've gotten to this point in event-driven applications. He'll explain the pros and cons of FaaS, and cover how in its current state cold starts and latency concerns need to be part of the bigger picture when building streaming applications. Finally, Neil shares five rules that will help you understand how FaaS fits with the event streaming application.EPISODE LINKSJourney to Event Driven – Part 1: Why Event-First Thinking Changes EverythingJourney to Event Driven – Part 2: Programming Models for the Event-Driven Architecture<a href='https://www.confluent.io/
14/02/2019 • 41 minutes
Using Terraform and Confluent Cloud with Ricardo Ferreira
Tim Berglund hosts Developer Advocate Ricardo Ferreira to discuss the concept of infrastructure as code, as well as the differences between Terraform, Ansible, Puppet and Chef. They also chat about why Terraform is such a big deal, some of the challenges involved with learning it and how Confluent leverages Terraform to achieve multi-cloud support for Confluent Cloud and tools for Confluent Platform.EPISODE LINKSTerraformTools for Confluent Cloud ClustersFully managed Apache Kafka as a service! Try free.
23/01/2019 • 28 minutes 57 seconds
Ask Confluent #9: With and Without ZooKeeper
Gwen asks: What happens when garbage collection causes Kafka to pause? And how do we run a Schema Registry cluster? We’ll find out in this episode of Ask Confluent.In "Ask Confluent," Gwen Shapira (Software Engineer, Confluent) and guests respond to a handful of questions and comments from Twitter, YouTube and elsewhere.EPISODE LINKSZooKeeper connection timeout configuration: zookeeper.connection.timeout.ms, as we said, this defaults to 6,000Schema Registry failover instructionsWatch the video version of this podcast
08/01/2019 • 15 minutes 11 seconds
Ask Confluent #8: Guozhang Wang on Kafka Streams Standby Tasks
Gwen is joined in studio by special guest Guozhang Wang, Kafka Streams pioneer and engineering lead at Confluent. He’ll talk to us about standby tasks and how one deserializes message headers. In "Ask Confluent," Gwen Shapira (Data Architect, Confluent) and guests respond to a handful of questions and comments from Twitter, YouTube and elsewhere.EPISODE LINKSDocumentation of standby tasks, including configsEvents with different schema in same topicHow to populate a database from Kafka and solve the parent-child relation problemWatch the video version of this podcast
18/12/2018 • 22 minutes 9 seconds
Ask Confluent #7: Kafka Consumers and Streams Failover Explained ft. Matthias Sax
Gwen is joined in studio by special guest Matthias J. Sax, a software engineer at Confluent. He’ll talk to us about Kafka consumers and Kafka Streams failover. In "Ask Confluent," Gwen Shapira (Data Architect, Confluent) and guests respond to a handful of questions and comments from Twitter, YouTube and elsewhere.EPISODE LINKSWatch the video version of this podcast
03/12/2018 • 23 minutes 51 seconds
Ask Confluent #6: Kafka, Partitions, and Exactly Once ft. Jason Gustafson
Gwen is joined in studio by special guest Jason Gustafson, a Kafka PMC member and engineer at Confluent. He’ll talk to us about the big questions on Kafka architecture— number of partitions and exactly once. In "Ask Confluent," Gwen Shapira (Data Architect, Confluent) and guests respond to a handful of questions and comments from Twitter, YouTube and elsewhere.EPISODE LINKSHardening Kafka ReplicationKafka open issuesWatch the video version of this podcast
05/11/2018 • 22 minutes 27 seconds
Kafka Summit SF 2018 Panel | Microsoft, Slack, Confluent, University of Cambridge
Neha Narkhede leads a panel discussion at Kafka Summit SF 2018 with Kevin Scott (CTO, Microsoft), Julia Grace (Head of Infrastructure Engineering, Slack), Martin Kleppman (Researcher, U. of Cambridge), Jay Kreps (Co-founder and CEO, Confluent) and Neha Narkhede (Co-founder and CTO at Confluent).
18/10/2018 • 34 minutes 52 seconds
Kafka Streams in Action with Bill Bejeck
Tim Berglund interviews Bill Bejeck about the Kafka Streams API and his new book, Kafka Streams in Action.
27/09/2018 • 49 minutes 8 seconds
Joins in KSQL 5.0 with Hojjat Jafarpour
KSQL 5.0 now supports stream-stream, stream-table and table-table joins. Tim Berglund interviews Hojjat Jafarpour about all three join types, how they work, what their limitations are and the new kinds of operations they unlock.For more, you can check out ksqlDB, the successor to KSQL.
20/09/2018 • 29 minutes 5 seconds
Ask Confluent #5: Kafka, KSQL and Viktor Gamov
Gwen is joined in studio by co-host Tim Berglund and special guest, Viktor Gamov, a new member of Confluent’s Developer Experience Team specializing in Kafka, KSQL and Kubernetes. In this episode, we’ll find out: Does Viktor know what he’s talking about?EPISODE LINKSWatch the video version of this podcast
10/09/2018 • 31 minutes 14 seconds
KSQL Use Cases with Nick Dearden
A discussion about how people actually use KSQL with Nick Dearden, stream processing expert at Confluent. Try KSQL!For more, you can check out ksqlDB, the successor to KSQL.
06/09/2018 • 32 minutes 5 seconds
Nested Data in KSQL with Hojjat Jafarpour
Interesting data isn't a polite little list of scalar types. Sometimes you have more complex structures and things like nesting. We'll see how KSQL supports that today as Tim Berglund discusses nested data in KSQL with Hojjat Jafarpour, a software engineer on the KSQL team at Confluent. EPISODE LINKSKSQL demos and infoKSQL GitHub KSQL Slack (#ksql channel) For more, you can check out ksqlDB, the successor to KSQL.
29/08/2018 • 13 minutes 20 seconds
UDFs and UDAFs in KSQL 5.0 with Hojjat Jafarpour
KSQL has a solid library of built-in functions, but no library is ever good enough. What if you want to write your own? We’ll learn how today with Hojjat Jafarpour, a software engineer on the KSQL team at Confluent.For more, you can check out ksqlDB, the successor to KSQL.
24/08/2018 • 18 minutes 36 seconds
Ask Confluent #4: The GitHub Edition
Want to see a feature implemented in KSQL or other Kafka-related project? Gwen answers your questions from YouTube and walks through how to use GitHub issues to request features. This is the episode #4 of "Ask Confluent," a segment in which Gwen Shapira and guests respond to a handful of questions and comments from Twitter, YouTube and elsewhere.EPISODE LINKSWatch the video version of this podcast
16/08/2018 • 13 minutes 59 seconds
Deep Dive into KSQL with Hojjat Jafarpour
Ever wonder what actually goes on when you run a KSQL query? Today, we take a deep dive into KSQL with Hojjat Jafarpour, a software engineer on the KSQL team at Confluent.For more, you can check out ksqlDB, the successor to KSQL.
13/08/2018 • 33 minutes 18 seconds
Ask Confluent #3: Kafka Upgrades, Cloud APIs and Data Durability
Tim Berglund and Gwen Shapira have a discussion with Koelli Mungee (Customer Operations Lead, Confluent) and cover the latest Apache Kafka upgrades, cloud APIs, and data durability. This is episode #3 of "Ask Confluent," a segment in which Gwen Shapira and guests respond to a handful of questions and comments from Twitter, YouTube, and elsewhere.EPISODE LINKSWatch the video version of this podcastFully managed Apache Kafka as a service! Try free.
20/07/2018 • 22 minutes 34 seconds
Ask Confluent #2: Consumers, Culture and Support
Gwen Shapira answers your questions and interviews Sam Hecht (Head of Support, Confluent). This is the second episode of "Ask Confluent," a segment in which Gwen Shapira and guests respond to a handful of questions and comments from Twitter, YouTube and elsewhere.EPISODE LINKSWatch the video version of this podcast
02/07/2018 • 24 minutes 22 seconds
Ask Confluent #1: Kubernetes, Confluent Operator, Kafka and KSQL
Tim Berglund and Gwen Shapira discuss Kubernetes, Confluent Operator, Kafka, KSQL, and more. This is the first episode of "Ask Confluent," a segment in which Gwen Shapira and guests respond to a handful of questions and comments from Twitter, YouTube and elsewhere.EPISODE LINKSWatch the video version of this podcast