Q: What is Schema Registry and why is Avro commonly used with Kafka?

Answer:

The Problem: Schema Evolution

In a microservices architecture, producers and consumers are developed by different teams and deployed at different times. What happens when the producer changes the message format (adds a field, renames one, changes a type)?

Without schema management, the consumer breaks because it can't deserialize the new format.

Schema Registry

The Confluent Schema Registry is a centralized service that stores and manages schemas for Kafka message keys and values. It ensures that producers and consumers agree on the data format.

Producer → Schema Registry: "Here's my schema, give me an ID"
           Schema Registry → "Schema ID: 42"
Producer → Kafka: [Schema ID: 42] + [Serialized Data]
                       ...
Consumer ← Kafka: [Schema ID: 42] + [Serialized Data]
Consumer → Schema Registry: "What schema is ID 42?"
           Schema Registry → Returns the schema
Consumer: Deserializes data using the schema

Why Avro?

Apache Avro is a binary serialization format that is the dominant choice for Kafka messages. It pairs perfectly with Schema Registry.

Avro schema example:

{
    "type": "record",
    "name": "OrderEvent",
    "namespace": "com.example",
    "fields": [
        {"name": "orderId", "type": "string"},
        {"name": "amount", "type": "double"},
        {"name": "currency", "type": "string", "default": "USD"},
        {"name": "timestamp", "type": "long"}
    ]
}

Why not JSON?

FeatureJSONAvroProtobuf
SizeLarge (text + keys)Compact (binary, no keys)Compact (binary)
SchemaNone (schema-less)RequiredRequired
SpeedSlow (parsing text)Fast (binary)Fast (binary)
Schema evolutionManualBuilt-inBuilt-in
Human readable✅ Yes❌ No❌ No

Avro messages are typically 50-70% smaller than JSON because they don't include field names — only the values, referenced by schema position.

Compatibility Modes

Schema Registry enforces compatibility rules when a schema evolves:

ModeRule
BACKWARD (default)New schema can read old data. Allows: adding fields with defaults, removing fields.
FORWARDOld schema can read new data. Allows: removing fields, adding optional fields.
FULLBoth backward and forward compatible.
NONENo compatibility checks.

Example of backward-compatible change:

// v1
{"name": "orderId", "type": "string"}
{"name": "amount", "type": "double"}

// v2 (backward compatible: new field has a default)
{"name": "orderId", "type": "string"}
{"name": "amount", "type": "double"}
{"name": "currency", "type": "string", "default": "USD"}  // ← NEW

[!TIP] In interviews, mentioning Avro + Schema Registry together shows you understand production Kafka. The key insight: Schema Registry acts as a contract between services, preventing breaking changes from deploying to production.