What Problem Does Kafka Solve?
In a traditional microservices architecture, when an order is placed, the Order Service must synchronously notify: Inventory Service, Payment Service, Notification Service, Analytics Service. Each call could fail. If Notification Service is down, the order fails. Services are tightly coupled.
With Kafka, the Order Service publishes one event: "OrderPlaced". All downstream services subscribe to this event independently. If Notification Service is down, it catches up on missed events when it restarts. The Order Service does not care who is listening — it just publishes and moves on.
Kafka Architecture
Core Concepts
Topics and Partitions
A topic is a named log — like a table in a database. You publish events to topics and subscribe to them. A topic is split into partitions — ordered, immutable sequences of events. Partitions enable parallelism: multiple consumers can read different partitions simultaneously.
Producers
Producers publish messages to Kafka topics. They can specify a partition key — messages with the same key always go to the same partition. This guarantees ordering for related events (all events for the same user_id go to the same partition, in order).
Consumers and Consumer Groups
A consumer group is a set of consumers that collectively read a topic. Each partition is assigned to exactly one consumer in the group. Adding more consumers to the group increases parallelism (each consumer reads fewer partitions). Multiple groups can independently read the same topic.
Kafka vs RabbitMQ vs Redis Pub/Sub
| Property | Kafka | RabbitMQ | Redis Pub/Sub |
|---|---|---|---|
| Pattern | Log-based streaming | Message queue (AMQP) | Fire-and-forget pub/sub |
| Message retention | Yes (days/weeks) | Until consumed | No — lost if no subscriber |
| Message replay | Yes (rewind offset) | No | No |
| Multiple consumers | Yes (consumer groups) | Competing consumers (one gets it) | All subscribers get it |
| Throughput | Millions/sec | Thousands/sec | Very high (in-memory) |
| Ordering | Per-partition guaranteed | Per-queue guaranteed | No guarantee |
| Complexity | High | Medium | Low |
| Best for | Event streaming, audit logs, analytics | Task queues, work distribution | Real-time notifications, caching |
Common Kafka Use Cases
- Microservices integration: Services communicate via events — decoupled, resilient, independently scalable
- Event sourcing: Store every state change as an event — rebuild state by replaying events
- Real-time analytics: Stream events to Apache Flink, Spark Streaming, or ksqlDB for real-time aggregation
- Log aggregation: Centralise application logs from hundreds of services into Kafka, then ship to Elasticsearch/S3
- Change Data Capture (CDC): Kafka Connect + Debezium streams every database row change as an event
- Activity tracking: LinkedIn uses Kafka to track user activity (page views, clicks) at 7 trillion messages/day
Managed Kafka Services
Running Kafka yourself is operationally complex. Managed services simplify this significantly: Confluent Cloud (Kafka as a Service), Amazon MSK (Managed Streaming for Kafka), Azure Event Hubs (Kafka-compatible), Redpanda (Kafka-compatible, much simpler ops). For most teams, a managed service is the right starting point.
Kafka Is Not a Database
Kafka stores events durably but is not queryable like a database. You cannot run SELECT queries on a Kafka topic. For querying event history, ship events to a data warehouse (Snowflake, BigQuery) or use ksqlDB for stream processing. Kafka is a transport and temporary storage layer — not a long-term data store for complex queries.
How We Research and Update This Guide
We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.
- The workflow or formula is tested directly in the tool and compared against independent reference examples.
- Examples are kept practical so readers can verify the result without hidden assumptions.
- Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.
Frequently Asked Questions — Apache Kafka
Apache Kafka is a distributed event streaming platform — a high-throughput, durable, fault-tolerant publish-subscribe message system. Producers publish events (messages) to topics. Consumers subscribe to topics and process events. Unlike traditional message queues, Kafka retains messages for a configurable period (days/weeks) and allows multiple consumer groups to independently read the same stream. Originally built at LinkedIn, open-sourced in 2011, now used by 80%+ of Fortune 100 companies.
A topic is a named log of events (like a database table for events). A partition is a subdivision of a topic — each topic is split into N partitions for parallelism. Events in a partition are ordered and immutable. Producers write to a topic (Kafka routes to a partition). Consumers read from partitions. More partitions = more parallelism = higher throughput. Events with the same key always go to the same partition (ordering guarantee within a key).
A consumer group is a set of consumers that collaborate to consume a topic. Each partition is consumed by exactly ONE consumer in a group at a time — this enables parallel processing. If you have a topic with 6 partitions and 3 consumers in a group, each consumer reads 2 partitions. Different consumer groups each get their own independent cursor — they consume the same events independently. This is how multiple downstream systems can all process the same event stream.
Kafka: high throughput (millions of messages/sec), event streaming and log, message replay/retention, multiple consumers reading the same stream, event sourcing, audit logs. RabbitMQ: traditional task queue (work distribution among workers), complex routing (exchanges, bindings), messages that should be deleted after consumption, lower throughput but more routing flexibility. Rule of thumb: if you need a task queue where messages are consumed and gone, RabbitMQ. If you need a durable event stream that many systems read from, Kafka.
Each partition is replicated across multiple brokers (replication factor). A partition has one leader (handles reads/writes) and N-1 followers (replicate data). If the leader fails, a follower is elected as the new leader automatically. Kafka persists all messages to disk — durability comes from disk storage, not memory. Producers can configure acks: acks=all means the leader waits for all replicas to confirm before acknowledging the write. This makes Kafka extremely durable.
Do not use Kafka for: low-volume messaging where a simpler queue (Redis Pub/Sub, SQS) suffices, scenarios requiring complex message routing logic (RabbitMQ exchanges are better), when you need messages to disappear after processing and do not want retention overhead, when your team lacks Kafka expertise (operational complexity is high), or for small projects where the infrastructure overhead exceeds the benefit. Kafka shines at scale — for low-volume apps, the complexity is rarely justified.