Apache Kafka on AWS (Amazon Managed Streaming for Apache Kafka / MSK) by Frank Munz

October 28, 2022 by No Comments

Hot off the press:-). I published a new article about streaming data pipelines with Kafka and Delta Live Tables. More advanced material once you seen this video here. That article comes includes a demo with a Twitter live steam and a bit of ML, aka Hugging Sentiment analysis: https://www.databricks.com/blog/2022/08/09/low-latency-streaming-data-pipelines-with-delta-live-tables-and-apache-kafka.html

but now back to the Kafka basics:

Apache Kafka is one of the most popular open-source projects for building messaging and streaming applications. Kafka takes data from various sources, makes it available to different applications, and therefore helps to eliminate daily batch jobs.

Kafka plays an important role for Change Data Capture (CDC) and in the world of microservices. This presentation gives an overview of the new Amazon Managed Streaming for Kafka (Amazon MSK).

Based on knowledge gained from several Kafka implementation projects I will explain some of the technical underpinnings first. You will learn about brokers, topics, and Zookeeper. Then I will explain what makes Kafka special, analyse major pain points in on-prem Kafka projects, critically analyse how Kafka differs from Kinesis, and why the cloud is the best way to use Kafka.