Running Apache Kafka in production is notoriously complex, but I’ve found that deploying it on Kubernetes using Bitnami’s Helm chart is a reliable way to get a battle-tested setup. The chart handles the tricky parts like persistent storage, ZooKeeper coordination, and external access. This is my personal guide to deploying a production-ready Kafka cluster on Kubernetes using this approach.
For a production setup, I always use multiple Kafka brokers (at least 3, but often more) and a separate ZooKeeper ensemble (3 or 5 nodes) for coordination. Everything is backed by persistent storage to ensure data durability.
My Deployment Process
Here’s the workflow I follow for a new Kafka deployment.
Step 1: Preparation
I always start by creating a dedicated kafka namespace to keep things isolated. Then I add the Bitnami Helm repository.
kubectl create namespace kafka
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
Step 2: My Go-To Configuration
The most critical part is the configuration. I’ve developed a go-to values.yaml file for my production deployments. Here are the most important sections:
# values.yaml
# I start with 9 brokers for a solid production setup
replicaCount: 9
# I always enable persistence for both Kafka and ZooKeeper
persistence:
enabled: true
size: 40Gi
storageClass: "standard"
zookeeper:
enabled: true
replicaCount: 3
persistence:
enabled: true
size: 8Gi
# This is crucial for allowing clients outside Kubernetes to connect
externalAccess:
enabled: true
service:
type: LoadBalancer
port: 9094
# I set resource requests and limits to ensure stability
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
# Anti-affinity ensures brokers are spread across different cluster nodes
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- kafka
topologyKey: kubernetes.io/hostname
Step 3: Installation and Verification
With my values file ready, I run helm install. I then watch the pods come up one by one. It can take a few minutes for all the brokers and ZooKeeper nodes to become ready.
helm install kafka bitnami/kafka \
--namespace kafka \
--values values.yaml
# Watch the deployment progress
kubectl get pods -n kafka -w
After installation, I verify everything by checking the services (kubectl get svc -n kafka) and persistent volume claims (kubectl get pvc -n kafka). I make sure that a LoadBalancer service has been created for each broker’s external access.
Step 4: Testing the Cluster
To make sure everything is working, I test the cluster by exec-ing into a client pod, creating a topic with a replication factor of 3, and running a simple console producer and consumer.
# Run a client pod
kubectl run kafka-client --restart='Never' --image docker.io/bitnami/kafka:latest -n kafka --command -- sleep infinity
# Exec into the pod
kubectl exec --tty -i kafka-client -n kafka -- bash
# Inside the pod, create a topic
kafka-topics.sh --create \
--bootstrap-server kafka:9092 \
--replication-factor 3 \
--partitions 6 \
--topic test-topic
# Produce some messages
kafka-console-producer.sh --bootstrap-server kafka:9092 --topic test-topic
> My first message
# In another terminal, consume the messages
kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic test-topic --from-beginning
How I Tune for Production
My base values.yaml is a starting point. For a high-throughput environment, I’ll significantly increase the CPU and memory resources and the persistent volume size. For security, which is a must in production, I enable SASL authentication and TLS encryption directly in the Helm chart’s values. It’s much easier than configuring it manually later.
Monitoring and Disaster Recovery
For monitoring, I enable the Prometheus metrics endpoints in the Helm chart and scrape them with our Prometheus Operator. I keep a close eye on consumer lag and disk usage, as those are often the first signs of trouble.
For disaster recovery, my primary strategy is ensuring all critical topics have a replication factor of at least 3. I also use my cloud provider’s volume snapshot capabilities to back up the PVCs, and for cross-cluster DR, I use MirrorMaker 2.
Final Thoughts
Using the Bitnami Helm chart has made running Kafka on Kubernetes manageable for me. My key takeaways are to always use a dedicated namespace, start with a robust values.yaml that includes persistence and external access, and set a replication factor of at least 3 for all production topics. I also can’t overstate the importance of setting up monitoring from day one, especially for disk usage and consumer lag. This approach has given me a solid, repeatable foundation for all my Kafka deployments.
Related Articles
- Flux2 GitOps - Automate Kafka deployments with Flux
- GKE Cluster Operations - Provision clusters for Kafka
- MuleSoft on Kubernetes - Integrate Kafka with MuleSoft