Q: Why is Kafka so fast?

Answer:

Kafka achieves extraordinary throughput (millions of messages/second) through several deliberate architectural decisions.

1. Sequential I/O (Append-Only Log)

Kafka writes messages to disk in a strictly sequential, append-only fashion. It never does random disk seeks.

Sequential disk writes are shockingly fast — often 600 MB/s+ on modern SSDs, compared to ~100 KB/s for random writes. This is because the OS can fully leverage disk write-ahead buffers and avoid head movement on HDDs.

2. Zero-Copy (sendfile)

When a consumer reads data, the normal path involves 4 copies:

Disk → Kernel Buffer → User Space → Socket Buffer → NIC

Kafka uses the Linux sendfile() system call to skip user space entirely:

Disk → Kernel Buffer → NIC  (zero-copy, 2 copies instead of 4)

This eliminates context switches and memory copies, reducing CPU usage and increasing throughput dramatically.

3. Page Cache (OS-Level Caching)

Kafka does NOT manage its own in-memory cache. Instead, it relies on the OS page cache. When data is written to disk, the OS caches it in RAM. When consumers read recent data, it's served directly from the page cache — essentially a free, automatically managed in-memory read cache.

Hot data (recent):  Served from OS page cache (RAM speed)
Cold data (old):    Read from disk (still fast due to sequential reads)

This is why Kafka's performance is often counter-intuitive: it writes to "disk" but reads from "memory."

4. Batching + Compression

Producers batch many messages together and optionally compress them. This means:

  • Fewer network round trips
  • Less disk I/O (one write for many messages)
  • Smaller on-disk footprint

5. Partitioning (Horizontal Scaling)

Each partition is an independent log. Multiple partitions can be read/written in parallel across different brokers and consumers. Adding partitions and brokers scales throughput linearly.

6. No Per-Message Acknowledgment to Consumers

Unlike RabbitMQ (which tracks ACK per message), Kafka consumers simply track their offset position. There's no per-message bookkeeping on the broker side, which eliminates enormous overhead.

Summary

TechniqueBenefit
Sequential I/OFast disk writes, no seeks
Zero-copy (sendfile)Minimal CPU for data transfer
Page cacheHot data served from RAM
BatchingAmortized network/disk overhead
CompressionLess bandwidth and storage
Partition parallelismLinear horizontal scaling
Offset-based trackingNo per-message broker state

[!TIP] In interviews, the two killer points are sequential I/O and zero-copy. These are what fundamentally differentiate Kafka's performance from traditional message brokers that rely on random I/O and per-message routing.