Devops Monk

May 04, 2026 Abhay 5 min read

KRaft Mode: Running Kafka Without ZooKeeper

Why ZooKeeper Had to Go For the first decade of Kafka’s existence, every Kafka cluster required an Apache ZooKeeper cluster to manage metadata: controller election, topic configurations, partition leadership, access control lists, and consumer group state. This created real problems: flowchart TB subgraph OldArchitecture["Old Architecture: Kafka + ZooKeeper"] subgraph ZK["ZooKeeper Cluster (3+ nodes)"] Z1["ZK Node 1"] Z2["ZK Node 2"] Z3["ZK Node 3"] end subgraph Kafka["Kafka Cluster"] K1["Broker 1\n(Controller)"] K2["Broker 2"] K3["

May 04, 2026 Abhay 4 min read

Message Headers: Metadata, Routing, and Custom Header Propagation

What Are Kafka Record Headers? Every Kafka record carries a list of Header objects — key-value pairs of String key and byte[] value. They sit outside the message payload and are ideal for: Trace propagation — carry X-Trace-Id / X-Span-Id across service boundaries Correlation IDs — link a response to a request in async flows Routing metadata — signal which region, tenant, or feature flag applies Schema type hints — __TypeId__ (set automatically by JsonSerializer) Event versioning — indicate schema version without modifying the payload flowchart LR subgraph Record["

May 04, 2026 Abhay 4 min read

Monitoring: Consumer Lag, Micrometer Metrics, and Actuator Integration

What to Monitor in Kafka Production Kafka applications need visibility into: Consumer lag — how many records are unprocessed per partition Throughput — records produced and consumed per second Error rates — listener exceptions, DLT records, retry counts Producer latency — time from send() to broker acknowledgment Rebalance frequency — high rebalance rate signals consumer instability Dependencies  <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency>  <dependency> <groupId>org.

May 04, 2026 Abhay 4 min read

Non-Blocking Retries: @RetryableTopic, BackOff, and the Retry Topic Chain

The Blocking Retry Problem DefaultErrorHandler retries by seeking back to the failed offset. While retrying, no other records from that partition are consumed — the partition is blocked. For a topic with high throughput, one slow retry can cause significant consumer lag. flowchart TD subgraph Blocking["Blocking Retry (DefaultErrorHandler)"] B1["poll() → [r50, r51, r52, r53]"] B2["process r50 ✓"] B3["process r51 ✗ — retry"] B4["wait 10s... retry r51 ✗"] B5["wait 20s... retry r51 ✗"

May 04, 2026 Abhay 5 min read

Offset Management: Auto-Commit vs Manual Acknowledgment

Why Offset Management Matters The committed offset determines what happens when a consumer restarts. If the offset is committed too early, a crash before processing completes means events are lost. If it is committed too late, a crash after processing but before committing means events are re-processed. flowchart TD subgraph TooEarly["Commit too early → Data Loss"] E1["Commit offset 43"] --> E2["Process record 42"] --> E3["Crash!"] E4["Restart: resume from 43"] --> E5["

May 04, 2026 Abhay 5 min read

Pausing, Resuming, and Stopping Listener Containers

Why Control Container Lifecycle? A running listener consumes from Kafka continuously. In production you need to: Pause consumption when a downstream service is overloaded (back-pressure) Resume once the downstream recovers Stop a container entirely during maintenance or feature flag toggles Restart after a configuration change without redeploying Spring Kafka exposes all of this through KafkaListenerEndpointRegistry and the container’s own lifecycle API. Container States stateDiagram-v2 [*] --> Running : start() Running --> Paused : pause() Paused --> Running : resume() Running --> Stopped : stop() Stopped --> Running : start() Paused --> Stopped : stop() Running — polling Kafka, dispatching to listener Paused — broker connection maintained, consumer heartbeat sent, no new records fetched Stopped — consumer thread terminated, partitions released back to group Paused is preferable to stopped for temporary throttling: it avoids a rebalance and keeps the consumer’s partition assignment intact.

May 04, 2026 Abhay 5 min read

Producer @Bean Configuration: Beyond application.properties

Why @Bean Configuration? application.properties is convenient for a single producer, but insufficient when you need: Multiple producers with different serializers (e.g. one for JSON events, one for Avro) Different settings per environment built at runtime (not just property substitution) Producers sending to different clusters (e.g. primary + DR cluster) Programmatic validation of configuration at startup flowchart TB subgraph PropertiesApproach["application.properties Approach"] P1["Single producer config\nspring.kafka.producer.*\n✓ Simple\n✗ One producer only\n✗ No runtime logic"] end subgraph BeanApproach["

May 04, 2026 Abhay 6 min read

Producer Acknowledgments: acks, min.insync.replicas, and Data Durability

What Are Acknowledgments? When a producer sends a record to a Kafka broker, it can wait for confirmation that the write was received and replicated before considering the send “complete.” The acks setting controls how much confirmation the producer requires. flowchart LR Producer["Producer"] Leader["Partition Leader\n(Broker 1)"] F1["Follower\n(Broker 2)"] F2["Follower\n(Broker 3)"] Producer -->|"ProduceRequest"| Leader Leader -->|"replicate"| F1 Leader -->|"replicate"| F2 Leader -->|"ProduceResponse ✓"| Producer style Producer fill:#3b82f6,color:#fff style Leader fill:#10b981,color:#fff The acknowledgment is the broker’s confirmation to the producer.

May 04, 2026 Abhay 6 min read

Producer Retries: Backoff, Timeouts, and Retry Strategies

Why Producers Need Retries Network errors, leader elections, and broker restarts are normal events in a distributed system. Without retries, a transient broker hiccup causes permanent data loss from the producer’s perspective. With retries, the producer automatically re-sends failed records until either the broker accepts them or a timeout deadline is reached. sequenceDiagram participant Producer participant Leader as Leader (Broker 1) participant NewLeader as New Leader (Broker 2) Producer->>Leader: ProduceRequest (offset 42) Note over Leader: Broker 1 crashes mid-write Leader--xProducer: No response (timeout) Note over Producer: retry.

May 04, 2026 Abhay 4 min read

Request-Reply Pattern with ReplyingKafkaTemplate

When Kafka Needs to Be Synchronous Kafka is designed for asynchronous event streaming. But some flows genuinely need a response: a payment validation service that must confirm before the order proceeds, or a pricing engine that must return the current price before checkout completes. ReplyingKafkaTemplate gives you a blocking send-and-receive call over Kafka without leaving the Kafka ecosystem. How Request-Reply Works sequenceDiagram participant Requester as "Order Service\n(ReplyingKafkaTemplate)" participant Broker participant Replier as "

Spring Kafka Tutorial