Question 1

Kafka vs RabbitMQ — which should I use?

Accepted Answer

Kafka: persistent log, replay, high throughput (trillions/day), topic-based, distributed by default. RabbitMQ: queuing, immediate delivery, lower latency for small workloads, easy to operate. Use Kafka for data pipelines (logging, event streaming, analytics). Use RabbitMQ for task queues (job processing, microservice messaging). Kafka is the standard for event-driven architectures at scale.

Question 2

What is exactly-once semantics and why is it hard?

Accepted Answer

Exactly-once = every event processed exactly one time, no duplicates and no loss. Hard because: (1) distributed systems can fail mid-processing, (2) must store state atomically with offset, (3) must handle retries idempotently. Kafka Streams handles this via transactional writes: write result + offset in single transaction. Cost: additional latency + state store overhead. Enable with `processing.guarantee: exactly_once_v2` (not the deprecated v1).

Question 3

How do I handle late-arriving data and out-of-order events?

Accepted Answer

Windowing with grace period: `window.until(Duration.ofMinutes(5))` delays window closure to catch late events. Out-of-order: use event timestamps (not processing time) with `TimestampExtractor`. For critical accuracy: wider windows (hourly vs per-minute) and state stores for deduplication. Kafka Streams 3.5+ supports timestamp recovery for replayed data.

Question 4

Kafka Streams vs Apache Flink — when do I pick which?

Accepted Answer

Kafka Streams: library (embedded in your app), stateful, runs next to your data, simpler for Kafka-native pipelines, Java/Scala only. Flink: framework (dedicated cluster), lower operational overhead, supports SQL, polyglot (Java/Python/SQL), better complex CEP patterns. Kafka Streams wins if you control the app; Flink if you need a shared cluster serving multiple use cases.

Question 5

How do I evolve schemas without breaking consumers?

Accepted Answer

Use Schema Registry with Avro/Protobuf and compatibility checking. Enable `BACKWARD` (new schema reads old data), `FORWARD` (old schema reads new data), or `FULL` (both). Add optional fields with defaults. Never rename/remove fields without a deprecation window. Test schema changes locally with `avsc` files before pushing to production.

Question 6

When should I NOT use Kafka?

Accepted Answer

Don't use Kafka for: (1) request-response patterns (use HTTP/gRPC), (2) low-latency <10ms requirements (network overhead is 5-10ms), (3) small data (Kafka's overhead + cluster cost doesn't pay off under 10k events/sec), (4) transactional consistency across services (use distributed transactions or sagas instead), (5) single-machine deployments (use SQLite + polling).

Question 7

How do I monitor consumer lag and detect problems?

Accepted Answer

Consumer lag = max offset - committed offset. Monitor via Confluent Control Center, Prometheus (JMX metrics), or Kafka Admin API. Alerting: lag > 1 hour = investigate. Check: (1) if consumers are running, (2) if processing is stuck (check app logs), (3) if topic is receiving data. Scale consumers = partition count; more partitions = more parallelism. Use dedicated lag monitoring tool (Burrow, Kafka Exporter) for multi-cluster visibility.

Region	Junior	Mid	Senior
USA	$115k	$155k	$220k
UK	£70k	£100k	£145k
EU	€75k	€105k	€155k
CANADA	C$120k	C$160k	C$230k

Kafka Streams

What is Kafka Streams

📋 Before you start

💰 Salary by region

🎓 Certifications

⚖ Compare with

❓ FAQ

Not sure this skill is for you?

Find your ideal career path