Demystifying Kafka: A Deep Dive into the Powerful Messaging System

August 9, 2023by Dhawal

Kafka is an open-source distributed messaging system that provides a fault-tolerant, and scalable platform for handling real-time data streams.Kafka was initially developed by LinkedIn and donated to Apache Software Foundation.Strimzi Kafka is used to deploy Kafka on Kubernetes.Strimzi is an open-source project.

Architecture

  1. Single-node single-broker
  2. Single-node multiple-broker
  3. Multiple-node multiple-broker cluster

Kafka follows a pub-sub model where producers write the data on topics and consumers subscribe to topics to get data.

Architecture components:

1]Topic

  • A topic is a field in which messages are published by producers and from which consumers consume messages.
  • It serves as a gateway through which data is organized and partitioned in a distributed manner.

2]Broker

  • A Kafka broker is a part of the Kafka cluster responsible for storing and managing the published data in topics.
  • Brokers receive messages from producers and store them in topic partitions. Each partition is an ordered, immutable sequence of messages.
  • Brokers replicate data across multiple nodes which provides fault tolerance. A topic can have multiple partitions and each partition can be hosted on a different broker.
  • Brokers serve consumer’s requests for data retrieval

3]Producer

  • Kafka Producer can be an application or service publishing data to Kafka topics.
  • The producer specifies the topic to which it wants to send the data and produces a message containing the data to be published.
  • Once the producer sends the message, it receives an acknowledgment from Kafka, indicating whether the message was successfully published or not.

4]Consumer

  • Kafka consumer is an application on a service that reads data from Kafka topics.
  • Consumers subscribe to the topic of their interest and consume messages from topic partitions.
  • Consumers can choose from which part they want to read data ie-(from the beginning or the current specific offset).
  • Each consumer in a consumer group reads from one or more partitions, and Kafka ensures that each partition is consumed by only one consumer within the group. This allows for parallel processing and load distribution.

Workflow:

  1. The producer publishes messages to Kafka topics through the Kafka cluster, with the messages being written to specific partitions in a topic.
  2. The brokers store the messages in topic partitions and replicate them across the cluster for fault tolerance.
  3. Consumers subscribe to topics and read messages from partitions, processing them according to their application logic and committing offsets.
  4. The Kafka cluster ensures that each partition is consumed by only one consumer within a consumer group, providing load balancing and scalability.

Overall, Kafka’s architecture and workflow make it a powerful platform for handling real-time data streams and building distributed, scalable, and fault-tolerant data processing applications.