Q: How do Consumer Groups and Offsets work in Kafka?
Answer:
Consumer groups are Kafka's mechanism for parallel consumption and load balancing.
Consumer Groups
A consumer group is a set of consumers that cooperate to consume messages from a topic. Each partition is assigned to exactly one consumer within a group. This ensures each message is processed once per group.
Topic "orders" (4 partitions)
Consumer Group "payment-service" (3 consumers):
Consumer A ← Partition 0, Partition 1
Consumer B ← Partition 2
Consumer C ← Partition 3
Key Rule: If the number of consumers exceeds the number of partitions, the extra consumers sit idle.
4 partitions, 6 consumers:
Consumer A ← Partition 0
Consumer B ← Partition 1
Consumer C ← Partition 2
Consumer D ← Partition 3
Consumer E ← IDLE ❌
Consumer F ← IDLE ❌
Multiple Consumer Groups
Different consumer groups consume the same topic independently. Each group maintains its own offset position — they don't interfere with each other.
Topic "orders" (3 partitions) →
Consumer Group "payment-service" → reads all 3 partitions independently
Consumer Group "email-service" → reads all 3 partitions independently
Consumer Group "analytics" → reads all 3 partitions independently
This is what makes Kafka a publish-subscribe system, not just a queue.
Offsets
An offset is a sequential integer that uniquely identifies each message within a partition. Offsets are how consumers track what they've already read.
Partition 0: [0] [1] [2] [3] [4] [5] [6] [7] [8]
↑
Consumer's current offset = 5
(has read 0-4, will read 5 next)
Where Are Offsets Stored?
Consumer offsets are committed to an internal Kafka topic called __consumer_offsets (50 partitions by default). This is a regular compacted topic managed by Kafka itself.
# Check committed offsets for a group
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--describe --group payment-service
Output:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
payment-service orders 0 1024 1030 6
payment-service orders 1 987 987 0
[!IMPORTANT] The
LAGcolumn is critical for monitoring. It shows how many unprocessed messages are waiting. Increasing lag = consumer can't keep up = potential backpressure issue.