After auditing over 30 Kubernetes clusters and implementing controls that prevented several potential breaches, I’ve learned that Kubernetes is not secure by default. You have to be deliberate and proactive. I approach Kubernetes security with a “defense-in-depth” strategy, layering controls at every level of the stack. This is my personal playbook for hardening production Kubernetes clusters.
Let me be clear about why I’m so paranoid about Kubernetes security. Two years ago, I was called in to help with an incident response at a company that had been breached. Attackers had exploited an overly permissive RBAC policy to gain cluster-admin access, then used that to deploy cryptominers across every node. The monthly cloud bill jumped from $15,000 to $87,000 before anyone noticed. That incident could have been prevented with basic security hygiene.
My Defense-in-Depth Strategy
graph LR
subgraph External["External Security"]
Firewall[Firewall Rules]
end
subgraph Cluster["Cluster Security"]
RBAC[RBAC Authorization]
Audit[Audit Logging]
end
subgraph Network["Network Security"]
NetPol[Network Policies]
end
subgraph Pod["Pod Security"]
PSS[Pod Security Standards]
end
subgraph Runtime["Runtime Security"]
Scan[Image Scanning]
Monitor[Runtime Monitoring]
end
External --> Cluster --> Network --> Pod --> Runtime
Here are the key layers of my security playbook.
Layer 1: RBAC and Least Privilege (Or How to Not Hand Out cluster-admin Like Candy)
I see this mistake constantly: developers ask for Kubernetes access, and someone grants them cluster-admin because it’s easier than figuring out what permissions they actually need. This is how breaches happen.
My first rule is never use default service accounts and always practice least privilege. I create fine-grained Roles and RoleBindings. A developer who needs to view logs gets a role that allows get, list, and watch on pods and nothing else. They don’t get create, they don’t get delete, and they definitely don’t get access to secrets.
I also disable automatic token mounting for service accounts by default. Kubernetes loves to mount service account tokens into every pod automatically. This means a compromised pod can use that token to talk to the API server. I set automountServiceAccountToken: false on all service accounts, then explicitly enable it only for pods that genuinely need API access.
# I disable auto-mounting by default
apiVersion: v1
kind: ServiceAccount
metadata:
name: limited-sa
automountServiceAccountToken: false
Layer 2: Lock Down Pods (Because Containers Aren’t Magic Security Boundaries)
A compromised pod is how most Kubernetes breaches start. Attackers exploit a vulnerable application, get shell access to a pod, then try to escalate privileges or pivot to other pods. Pod Security Standards are your defense against this.
I enforce the restricted Pod Security Standard at the namespace level for all production workloads. This is non-negotiable.
# I apply this label to all my production namespaces
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
This means all pods must run as non-root users, have read-only root filesystems, and drop all Linux capabilities by default. I explicitly define a secure securityContext in every pod spec. This breaks some badly-written applications, but if your app needs root access in a container, it’s not production-ready.
Layer 3: Zero-Trust Networking (Default Deny Everything)
Here’s a terrifying fact: by default in Kubernetes, any pod can talk to any other pod on any port. Your frontend can talk directly to your database. A compromised test pod can scan your entire cluster. This is insane.
I immediately lock this down by applying a default-deny-all NetworkPolicy to all production namespaces. Nothing can talk to anything unless I explicitly allow it. Then I create specific policies for necessary traffic, like allowing the frontend to talk to the backend on port 8080 and nothing else.
I learned this the hard way during a security audit. The auditor showed me how they could reach our internal admin API from a public-facing pod. That shouldn’t have been possible, but our network was wide open. We fixed it that day.
# This policy denies all ingress and egress traffic in a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Layer 4: Secure the Supply Chain (Because Docker Hub Is Not Your Friend)
Container images are code from the internet. Would you run random code from the internet in production? No? Then why are you pulling random container images?
I use OPA Gatekeeper to enforce a policy that only allows images from our trusted container registry to run in the cluster. Any pod that tries to pull from Docker Hub or some random registry gets rejected at admission time.
In our CI/CD pipeline, image scanning is a mandatory gate. We use Trivy to scan every image for high and critical vulnerabilities before it can be pushed to our registry. If the scan fails, the build fails. No exceptions. This has prevented us from deploying images with known critical vulnerabilities multiple times.
# A simple example of a CI/CD scanning step
security-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy image --severity HIGH,CRITICAL --exit-code 1 $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
Layer 5: Assume Breach and Monitor Everything
Defense in depth means assuming that despite all your controls, a breach could still happen. When it does, you need to detect it immediately.
I use Falco for runtime security monitoring. It watches system calls and detects suspicious behavior like shells being spawned in containers, unexpected outbound network connections, or processes reading sensitive files. I have custom rules tuned to our environment.
Falco has alerted us to real attacks. Once it caught someone who had compromised a developer’s laptop credentials trying to kubectl exec into a production pod at 2 AM. We blocked them before they could do any damage.
What This Actually Prevented
These controls aren’t theoretical. They’ve prevented real incidents.
We reduced our attack surface by over 85% through strict network policies. The security team measured this by running penetration tests before and after implementation.
Runtime monitoring has alerted us to and helped us prevent 12 potential breaches over two years. Most were compromised developer credentials. A few were actual attackers who had exploited application vulnerabilities but couldn’t escalate because of our pod security policies.
We achieved SOC 2 compliance in six months because of these documented and automated controls. The auditors loved that everything was enforced in code rather than relying on human processes.
We’ve had zero privilege escalation incidents in over two years. This is the metric I’m most proud of because privilege escalation is how small compromises become catastrophic breaches.
What I Learned About Kubernetes Security
Defense in depth is the only approach that works. Every layer I described has failed at some point, but the other layers caught what got through. You need multiple independent controls.
Default to least privilege for everything. Users, service accounts, network policies, everything. If someone needs more access, they can ask and justify it. But the default should be minimal permissions.
Zero-trust networking isn’t paranoid, it’s prudent. Deny all traffic by default and explicitly allow only what’s necessary. The first time you do this, you’ll discover all sorts of unnecessary network paths that you didn’t know existed.
Automate security in CI/CD. Image scanning and policy checks must be mandatory gates that block bad code from reaching production. Manual security reviews don’t scale and humans forget things.
Monitor at runtime because prevention fails eventually. You need to detect and respond to threats as they happen, not discover them weeks later during an audit.