Road map for Kafka

 Here’s a structured roadmap of Kafka topics you should master to ace interviews—grouped by level and area:


1. Core Fundamentals

  1. What is Kafka & Use-Cases

    • Pub/Sub vs. queue messaging

    • Real-time streaming vs. batch processing

  2. Architecture Overview

    • Brokers, clusters, ZooKeeper (or KRaft)

    • Topics, partitions, replicas, leaders vs. followers


2. Producers & Consumers

  1. Producer API

    • Synchronous vs. asynchronous sends

    • Keyed vs. round-robin partitioning

  2. Consumer API

    • Groups and group management

    • Offset commits (automatic vs. manual)

    • Rebalancing and partition assignment strategies


3. Data Modeling & Serialization

  1. Message Key & Value

    • Why keys matter for ordering and compaction

  2. Serialization Formats

    • String, JSON, Avro, Protobuf, JSON-Schema

    • Schema Registry basics & compatibility settings


4. Delivery Semantics & Transactions

  1. “At most once,” “At least once,” “Exactly once”

    • How retries, acks, and idempotence work

  2. Transactions

    • initTransactions(), beginTransaction(), commitTransaction()

    • Use cases and limitations


5. Cluster Operations & Administration

  1. Topic Management

    • Creating topics, partitions, replication factor

    • Topic-level configs (cleanup.policy, retention.ms)

  2. Broker Configuration

    • Server.properties key settings (log.dirs, listeners, controller)

  3. Scaling & High Availability

    • Adding/removing brokers

    • Leader election, ISR (in-sync replicas)


6. Security & Compliance

  1. Authentication

    • SSL/TLS, SASL (PLAIN, SCRAM, GSSAPI/Kerberos)

  2. Authorization

    • ACLs with kafka-acls.sh

  3. Encryption & Auditing

    • Encrypting data in-transit and at-rest


7. Monitoring & Performance Tuning

  1. Key Metrics

    • Broker: CPU, disk, network, request handlers

    • Consumer lag, under-replicated partitions

  2. Tuning Parameters

    • num.network.threads, fetch.min.bytes, compression.type

  3. Tools

    • JMX, Prometheus + Grafana, Confluent Control Center


8. Ecosystem & Advanced Features

  1. Kafka Connect

    • Source vs. sink connectors

    • Distributed vs. standalone mode

  2. Kafka Streams & KSQL

    • Stateless vs. stateful transformations

    • Windowing, joins, aggregations

  3. Tiered Storage & MirrorMaker

    • Cross-data-center replication (MirrorMaker 2)

    • Cold storage integration


9. Real-World Patterns & Best Practices

  • Schema evolution strategies

  • Error handling (DLQs, retry topics)

  • Idempotent consumers/producers

  • Event design (event sourcing, CQRS)


10. Hands-On & Sample Questions

  • Write a producer that sends JSON to a topic with 3 partitions.

  • How would you handle a consumer that’s fallen far behind?

  • Explain what happens during a broker failure.

  • Describe how exactly-once semantics work end-to-end.

  • Sketch an end-to-end flow using Kafka Connect from MySQL to Elasticsearch.


Next Steps:

  1. Pick a section each day and build a mini demo project.

  2. Practice white-boarding common failure-recovery and scaling scenarios.

  3. Review official docs and try out Confluent’s free sandbox.

Good luck—you’ve got this! 🚀

Comments