What is Kafka?
Apache Kafka is a distributed open-source powerhouse, Kafka drives seamless message exchange while powering real-time streaming pipelines and applications, propelling data from source to destination with unmatched agility.
Apache Kafka’s Core Advantages Simplified features
- Instant Latency: Kafka’s rapid end-to-end latency, even for vast data streams, ensures quick data retrieval and seamless consumption, thanks to message decoupling.
- Streamlined Real-Time Messaging: Kafka’s knack for message decoupling and effective storage enables real-time data publishing, subscription, and processing.
- Scalability Reinvented: Kafka’s distributed architecture effortlessly handles surges in data speed and volume, ensuring smooth scaling with no downtime.
- Reliability Beyond Compare: Kafka’s data duplication and distribution strategy guarantees data resilience, making it fault-tolerant and reliable.
- Integration Prowess: Kafka harmonizes with various data-processing frameworks like Spark, Storm, Hadoop, and AWS, enriching real-time data pipelines with seamless integration.
Apache Kafka – Cluster Architecture
Kafka Cluster Architecture Components:
Broker: The Communication Hub: A broker acts as a server hosting Kafka, facilitating communication between various services. Multiple brokers collaborate to form a Kafka cluster.
Event: Bytes of Wisdom: Events are the messages flowing in and out of the Kafka broker. These messages are stored as bytes in the broker’s disk storage.
Producer and Consumer: The Messengers: Producers create events in Kafka, while consumers retrieve them. Interestingly, a service might handle both roles.
Topic: The Organized Repository: Topics categorize events, akin to folders in a file system. Each holds messages of a specific type, like “payment-details” or “user-details.”
Partition: Dividing the Load: A topic can be divided into partitions, optimizing throughput. Partitions are the smallest storage units, containing subsets of data.
Replication Factor: Backup Brigade: Replicas are backups of partitions. A topic’s replication factor determines the number of copies for each partition. For instance, a topic with a partition and replication factor of 2 results in two identical partition copies stored in the cluster.
Offset: offset is Kafka’s consumption marker, like a bookmark in a story. It tracks consumed events, helping consumers pick up where they left off, even after interruptions.
Zookeeper: Kafka’s Co-Pilot and Protector, Zookeeper, an integral companion within Kafka, manages cluster ACLs, stores offsets, monitors broker status, and oversees client quotas—ensuring Kafka’s harmony and security.
Consumer Group: Unity for Efficient Consumption, In Kafka’s world, consumers unite as a Consumer Group to collectively consume messages from designated topics. Grouped consumers each handle distinct partitions, preventing message repetition. This collaboration enhances consumption speed, especially when multiple consumers tackle shared topics.
Installation
K8s Deployment Steps.
- Make project directory with any name and go inside the directory, for example we are using kafka-deployment
mkdir kafka-deployment
cd kafka-deployment
2. Download the strimzi operator 0.29 zip link
wget https://github.com/strimzi/strimzi-kafka-operator/releases/download/0.29.0/strimzi-0.29.0.zip
3. Unzip the strimzi operator file and go inside the folder.
unzip strimzi-0.29.0.zip
cd strimzi-0.29.0/
4. Edit the Strimzi installation files to use the namespace the Cluster Operator is going to be installed into. For example, in this procedure the Cluster Operator is installed into the namespace kafka
sed -i 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml
5. Edit the install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml file and set the STRIMZI_NAMESPACE environment variable to all namespaces (*) my-kafka-project to give permission to the Cluster Operator to watch all the namespaces.
# ... env: - name: STRIMZI_NAMESPACE value: "*" # …
6. Create ClusterRoleBindings that grant cluster-wide access for all namespaces to the Cluster Operator.
kubectl create clusterrolebinding strimzi-cluster-operator-namespaced --clusterrole=strimzi-cluster-operator-namespaced --serviceaccount kafka:strimzi-cluster-operator
kubectl create clusterrolebinding strimzi-cluster-operator-watched --clusterrole=strimzi-cluster-operator-watched --serviceaccount kafka:strimzi-cluster-operator
kubectl create clusterrolebinding strimzi-cluster-operator-entity-operator-delegation --clusterrole=strimzi-entity-operator --serviceaccount kafka:strimzi-cluster-operator
7.Deploy the Cluster Operator to your Kubernetes cluster.
Go into the strimzi-0.29.0 and fire the below command
kubectl create -f install/cluster-operator/ -n kafka
8. Download the deployment file of kafka and edit the deployment file as per your requirement(increase pods,memory,advertisedPort).After that run the yaml file using the below command
kubectl apply -f kafka-sample.yaml -n <namespace>
apiVersion: kafka.strimzi.io/v1beta2 kind: Kafka metadata: name: kafka spec: entityOperator: topicOperator: {} userOperator: {} kafkaExporter: topicRegex: ".*" groupRegex: ".*" kafka: config: default.replication.factor: 3 fetch.message.max.bytes: 10485760 inter.broker.protocol.version: "3.0" max.message.bytes: 10485760 message.max.bytes: 10485760 min.insync.replicas: 2 offsets.topic.replication.factor: 3 replica.fetch.max.bytes: 10485760 transaction.state.log.min.isr: 2 transaction.state.log.replication.factor: 3 listeners: - name: external port: 9094 tls: false type: nodeport configuration: brokers: - broker: 0 advertisedHost: example.com advertisedPort: 5002 - broker: 1 advertisedHost: example.com advertisedPort: 5003 - broker: 2 advertisedHost: example.com advertisedPort: 5004 replicas: 4 resources: requests: memory: 500Mi cpu: 400m limits: memory: 800Mi cpu: 600m storage: type: jbod volumes: - deleteClaim: false id: 0 size: 5Gi type: persistent-claim version: 3.0.0 zookeeper: replicas: 3 resources: requests: memory: 200Mi cpu: 200m limits: memory: 400Mi cpu: 400m storage: deleteClaim: false size: 1Gi type: persistent-claim
9. Verify kafka namespaces all pods are running are not.
kubectl get all -n kafka
Connect through kafka database
Producer connection:
1.Execute the following command to create a producer and send a text message.
kubectl run -n <namespace> kafka-producer -ti --image=quay.io/strimzi/kafka:0.27.0-kafka-2.8.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server kafka-kafka-external-bootstrap.<namespace>.svc.cluster.local:9094 --topic <topic-name>
Consumer connection:
1. In another terminal run the command below to create a consumer, that will receive messages from a topic.
kubectl run -n <namespace> kafka-consumer -ti --image=quay.io/strimzi/kafka:0.27.0-kafka-2.8.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server kafka-kafka-external-bootstrap.<namespace>.svc.cluster.local:9094 --topic <topic-name> --from-beginning
Finally we are getting live streaming messages. if produces sent messages at same time the consumer also receiver.