My Guide to Production-Ready Kafka on Kubernetes with Bitnami

Running Apache Kafka in production is notoriously complex, but I’ve found that deploying it on Kubernetes using Bitnami’s Helm chart is a reliable way to get a battle-tested setup. The chart handles the tricky parts like persistent storage, ZooKeeper coordination, and external access. This is my personal guide to deploying a production-ready Kafka cluster on Kubernetes using this approach.

For a production setup, I always use multiple Kafka brokers (at least 3, but often more) and a separate ZooKeeper ensemble (3 or 5 nodes) for coordination. Everything is backed by persistent storage to ensure data durability.

My Deployment Process

Here’s the workflow I follow for a new Kafka deployment.

Step 1: Preparation

I always start by creating a dedicated kafka namespace to keep things isolated. Then I add the Bitnami Helm repository.

kubectl create namespace kafka
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

Step 2: My Go-To Configuration

The most critical part is the configuration. I’ve developed a go-to values.yaml file for my production deployments. Here are the most important sections:

# values.yaml

# I start with 9 brokers for a solid production setup
replicaCount: 9

# I always enable persistence for both Kafka and ZooKeeper
persistence:
  enabled: true
  size: 40Gi
  storageClass: "standard"

zookeeper:
  enabled: true
  replicaCount: 3
  persistence:
    enabled: true
    size: 8Gi

# This is crucial for allowing clients outside Kubernetes to connect
externalAccess:
  enabled: true
  service:
    type: LoadBalancer
    port: 9094

# I set resource requests and limits to ensure stability
resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

# Anti-affinity ensures brokers are spread across different cluster nodes
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - kafka
          topologyKey: kubernetes.io/hostname

Step 3: Installation and Verification

With my values file ready, I run helm install. I then watch the pods come up one by one. It can take a few minutes for all the brokers and ZooKeeper nodes to become ready.

helm install kafka bitnami/kafka \
  --namespace kafka \
  --values values.yaml

# Watch the deployment progress
kubectl get pods -n kafka -w

After installation, I verify everything by checking the services (kubectl get svc -n kafka) and persistent volume claims (kubectl get pvc -n kafka). I make sure that a LoadBalancer service has been created for each broker’s external access.

Step 4: Testing the Cluster

To make sure everything is working, I test the cluster by exec-ing into a client pod, creating a topic with a replication factor of 3, and running a simple console producer and consumer.

# Run a client pod
kubectl run kafka-client --restart='Never' --image docker.io/bitnami/kafka:latest -n kafka --command -- sleep infinity

# Exec into the pod
kubectl exec --tty -i kafka-client -n kafka -- bash

# Inside the pod, create a topic
kafka-topics.sh --create \
  --bootstrap-server kafka:9092 \
  --replication-factor 3 \
  --partitions 6 \
  --topic test-topic

# Produce some messages
kafka-console-producer.sh --bootstrap-server kafka:9092 --topic test-topic
> My first message

# In another terminal, consume the messages
kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic test-topic --from-beginning

How I Tune for Production

My base values.yaml is a starting point. For a high-throughput environment, I’ll significantly increase the CPU and memory resources and the persistent volume size. For security, which is a must in production, I enable SASL authentication and TLS encryption directly in the Helm chart’s values. It’s much easier than configuring it manually later.

Monitoring and Disaster Recovery

For monitoring, I enable the Prometheus metrics endpoints in the Helm chart and scrape them with our Prometheus Operator. I keep a close eye on consumer lag and disk usage, as those are often the first signs of trouble.

For disaster recovery, my primary strategy is ensuring all critical topics have a replication factor of at least 3. I also use my cloud provider’s volume snapshot capabilities to back up the PVCs, and for cross-cluster DR, I use MirrorMaker 2.

Final Thoughts

Using the Bitnami Helm chart has made running Kafka on Kubernetes manageable for me. My key takeaways are to always use a dedicated namespace, start with a robust values.yaml that includes persistence and external access, and set a replication factor of at least 3 for all production topics. I also can’t overstate the importance of setting up monitoring from day one, especially for disk usage and consumer lag. This approach has given me a solid, repeatable foundation for all my Kafka deployments.

Flux2 GitOps - Automate Kafka deployments with Flux
GKE Cluster Operations - Provision clusters for Kafka
MuleSoft on Kubernetes - Integrate Kafka with MuleSoft