Q: Why is Kafka so fast?

Answer:

Kafka achieves extraordinary throughput (millions of messages/second) through several deliberate architectural decisions.

1. Sequential I/O (Append-Only Log)

Kafka writes messages to disk in a strictly sequential, append-only fashion. It never does random disk seeks.

Sequential disk writes are shockingly fast — often 600 MB/s+ on modern SSDs, compared to ~100 KB/s for random writes. This is because the OS can fully leverage disk write-ahead buffers and avoid head movement on HDDs.

2. Zero-Copy (sendfile)

When a consumer reads data, the normal path involves 4 copies:

Disk → Kernel Buffer → User Space → Socket Buffer → NIC

Kafka uses the Linux sendfile() system call to skip user space entirely:

Disk → Kernel Buffer → NIC  (zero-copy, 2 copies instead of 4)

This eliminates context switches and memory copies, reducing CPU usage and increasing throughput dramatically.

3. Page Cache (OS-Level Caching)

Kafka does NOT manage its own in-memory cache. Instead, it relies on the OS page cache. When data is written to disk, the OS caches it in RAM. When consumers read recent data, it's served directly from the page cache — essentially a free, automatically managed in-memory read cache.

Hot data (recent):  Served from OS page cache (RAM speed)
Cold data (old):    Read from disk (still fast due to sequential reads)

This is why Kafka's performance is often counter-intuitive: it writes to "disk" but reads from "memory."

4. Batching + Compression

Producers batch many messages together and optionally compress them. This means:

Fewer network round trips
Less disk I/O (one write for many messages)
Smaller on-disk footprint

Technique	Benefit
Sequential I/O	Fast disk writes, no seeks
Zero-copy (sendfile)	Minimal CPU for data transfer
Page cache	Hot data served from RAM
Batching	Amortized network/disk overhead
Compression	Less bandwidth and storage
Partition parallelism	Linear horizontal scaling
Offset-based tracking	No per-message broker state

[!TIP] In interviews, the two killer points are sequential I/O and zero-copy. These are what fundamentally differentiate Kafka's performance from traditional message brokers that rely on random I/O and per-message routing.

Kafka Interview Questions and Answers

Q: Why is Kafka so fast?

1. Sequential I/O (Append-Only Log)

2. Zero-Copy (sendfile)

3. Page Cache (OS-Level Caching)

4. Batching + Compression

5. Partitioning (Horizontal Scaling)

6. No Per-Message Acknowledgment to Consumers

Summary