Kafka — Part One

Sanket Saxena
3 min readJun 1, 2023

--

As I begin my journey exploring Apache Kafka, a powerful distributed event streaming platform, I wanted to share my initial findings. Let’s dive into the basic components of Kafka.

Topics

Topics in Kafka are essentially categories or feed names to which records get published.

  • Structure: Topics consist of one or more partitions, with every partition assigned an incremental ID, known as an offset.
  • Immutability: Data within Kafka topics is immutable, signifying it cannot be modified once written.
  • Storage Capacity: Topics have the potential to store unlimited data — the limit is set only by the storage capacity of your machines.
  • Data Retention: By default, a Kafka cluster retains all published records for one week, providing ample time to process your data.

Producers

Producers in Kafka publish data to the topics of their choice.

  • Partition Allocation: Producers decide to which partition they should write data. This decision is often based on a hash of the key to ensure balanced data distribution.
  • Message Structure: A Kafka message contains key, value, compression type, headers, offset, partition, and a timestamp (ts).
  • Key Hashing: Producers use a hashing mechanism (default murmur) on the key to determine the destination partition. Messages with the same key will always land in the same partition.

Consumers

Consumers in Kafka read data from the brokers.

  • Pull Model: Consumers follow a pull model, meaning they request data from brokers.
  • Consumer Groups: Consumers belong to consumer groups. Each consumer within a group reads from a distinct set of topic partitions.
  • Fault Tolerance: If a broker goes down, consumers can recover and continue operation.

Brokers

Brokers are the servers in Kafka that store and process published records.

  • Unique Identification: Each broker is identified by an ID and can contain certain topic partitions.
  • Scalability: The Kafka system is horizontally scalable because the partitions can be distributed over several brokers in the cluster.

Topic Replication Factor

The replication factor is a vital attribute for maintaining data reliability and resilience.

  • Higher Replication Factor: The replication factor is usually greater than 2, ensuring higher data availability.
  • Data Resilience: Each topic partition is replicated across other brokers in a Kafka cluster, making the data in the Kafka system resilient to failures.

Continuing on our exploration of Apache Kafka, let’s delve further into Zookeeper and KRaft, two significant components of Kafka’s ecosystem.

Zookeeper

Zookeeper, until recent updates in Kafka, played a crucial role in managing various aspects of the system.

  • Broker Management: Zookeeper was responsible for managing brokers, which are the Kafka servers that store and process published records.
  • Leader Elections: Zookeeper also coordinated leader elections. In the event of a leader failure, Zookeeper was responsible for electing a new leader from the set of In-Sync Replicas (ISRs).
  • Cluster Metadata: Zookeeper maintained a significant amount of cluster metadata. This includes the details of topics, partitions, and replicas.
  • Security: Earlier, Zookeeper was less secure, as clients could connect with it directly.
  • Server Configuration: Zookeeper operates well with an odd number of servers, managing leader and follower concepts efficiently.

KRaft (Kafka Raft Metadata mode)

Introduced in Kafka 3.x, KRaft (Kafka Raft) mode simplifies the Kafka architecture by removing the dependency on Zookeeper.

  • Self-contained Metadata Management: Kafka now manages its own metadata. It eliminates the need for a separate Zookeeper ensemble and enables Kafka to manage the metadata by itself, leading to simplified operations.
  • Quorum Controller: In KRaft mode, a Quorum Controller is used, which consists only of Kafka brokers. One of these brokers is chosen as the quorum leader.
  • Security: The switch to KRaft mode introduced a unified security model, enhancing the security of the Kafka ecosystem.
  • Faster Controller: With KRaft mode, controller shutdown and recovery time has been improved, reducing the overall time required to handle broker failures.

By transitioning from Zookeeper to KRaft, Kafka has become a more self-reliant system, leading to increased efficiency and security. As a beginner exploring Kafka, understanding these components and their evolution provides a clear picture of Kafka’s capabilities and future direction.

--

--

Sanket Saxena
Sanket Saxena

Written by Sanket Saxena

I love writing about software engineering and all the cool things I learn along the way.

No responses yet