Q: Explain Brokers, Topics, and Partitions in Kafka.

Answer:

These are the three fundamental building blocks of Kafka's architecture.

Broker

A broker is a single Kafka server. A Kafka cluster consists of multiple brokers (typically 3+). Each broker:

  • Stores data on disk
  • Serves producer and consumer requests
  • Participates in replication
  • Is identified by a unique integer ID

Brokers are designed so that no single broker holds all the data for a topic — data is distributed across brokers via partitions.

Topic

A topic is a named category/feed to which messages are published. Think of it as a table in a database or a folder in a filesystem.

Topics: "user-signups", "order-events", "payment-transactions"

Topics are multi-subscriber — many consumer groups can read from the same topic independently without affecting each other.

Partition

Each topic is split into one or more partitions. A partition is an ordered, immutable sequence of messages (an append-only log). Each message within a partition gets a sequential ID called an offset.

Topic: "orders" (3 partitions)

Partition 0: [msg0] [msg1] [msg2] [msg3] [msg4] → 
Partition 1: [msg0] [msg1] [msg2] →
Partition 2: [msg0] [msg1] [msg2] [msg3] →

Why Partitions Matter

1. Parallelism Each partition can be consumed by a different consumer in a consumer group. More partitions = more consumers = higher throughput.

2. Ordering Messages are strictly ordered WITHIN a partition, but there is no ordering guarantee ACROSS partitions. If ordering matters for a specific entity (e.g., all events for user X), you must ensure all events for that entity go to the same partition using a partition key.

3. Distribution Partitions are spread across brokers. For a topic with 6 partitions on a 3-broker cluster, each broker holds ~2 partitions.

How They Relate

Kafka Cluster
├── Broker 0
│   ├── orders-partition-0 (Leader)
│   └── orders-partition-2 (Follower)
├── Broker 1
│   ├── orders-partition-1 (Leader)
│   └── orders-partition-0 (Follower)
└── Broker 2
    ├── orders-partition-2 (Leader)
    └── orders-partition-1 (Follower)

[!IMPORTANT] Choosing the right number of partitions is a critical design decision. Too few = throughput bottleneck. Too many = increased memory usage, slower leader elections, and longer recovery times. A common starting point is number of partitions = desired throughput / throughput per partition (usually a few MB/s per partition).