How I Cut Our Kubernetes Costs by 60%

I’ll never forget the day I saw our monthly GKE bill: $45,000. I literally did a double-take at my laptop screen. My first thought was that there had to be a mistake, maybe an extra zero somewhere. But no, this was real. I immediately opened a Slack DM with our CFO, and let’s just say she wasn’t thrilled. She gave me two weeks to come back with a plan, or we’d be having some very uncomfortable budget conversations.

That night, I stayed late digging through our Kubernetes clusters. What I found was almost embarrassing. Pods requesting 4Gi of memory but using barely 500Mi. Entire node pools sitting idle during off-hours. Development environments that nobody had touched in weeks, still burning through compute credits 24/7. It was like discovering you’d been paying for three gym memberships you never used, except way more expensive.

The worst part? I had set up some of these over-provisioned configs myself six months earlier. Back then, we were having production issues, and my solution had been to just throw more resources at everything. “Better safe than sorry,” I’d told myself. Turns out, I’d been very safe and very, very sorry when that bill arrived.

My Cost Reduction Journey

pie title My Monthly Cost Reduction Journey
    "Initial Cost: $45k" : 45
    "Final Cost: $18k" : 18

I spent that weekend building out a three-phase plan: Measure, Optimize, and Automate. I knew I couldn’t just start randomly cutting resources without data to back up my decisions. That’s how you end up with worse problems than a high cloud bill.

graph LR
    Start[High Bill: $45k]
    Measure["Phase 1: Measure"]
    Optimize["Phase 2: Optimize"]
    Automate["Phase 3: Automate"]
    Result[Optimized: $18k]

    Start --> Measure --> Optimize --> Automate --> Result

    style Start fill:#ff6b6b,color:#fff
    style Result fill:#51cf66,color:#fff

My Step-by-Step Optimization Plan

Step 1: I Attacked Over-provisioning with VPA

Here’s something nobody tells you about Kubernetes: developers are terrible at guessing resource requirements. Myself included. When you’re deploying something new and you’re not sure how much memory it needs, you add a safety margin. Then you add another safety margin just to be extra safe. Before you know it, you’re requesting 4Gi of memory for a service that peaks at 600Mi.

My first move was deploying the Vertical Pod Autoscaler in “Recommend” mode. I wasn’t brave enough to let it auto-adjust things right away (I’d learned my lesson about being too aggressive with changes). Instead, I wanted two weeks of solid data before making any decisions.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Recommend" # I started in recommendation mode to be safe

Two weeks later, I had the data I needed, and it was eye-opening. Our API service that we’d allocated 4Gi for? It was consistently using 512Mi. Our background job processor that requested 8 CPU cores? It averaged 1.2 cores. I took screenshots of everything and sent them to the team. The responses ranged from “oh wow” to “oops” to complete silence.

Applying the VPA recommendations across our production workloads was nerve-wracking. I did it gradually, one deployment at a time, watching our monitoring dashboards like a hawk. But nothing broke. In fact, nothing even slowed down. We just stopped paying for resources we weren’t using. This single change accounted for about 40% of our total savings.

Step 2: I Embraced Spot Instances

I’ll be honest, I’d always been scared of spot instances. The idea that Google could just yank my VMs away with 30 seconds notice made me nervous. But when I actually sat down and thought about which workloads could handle interruptions, I realized we had a lot of them.

Our CI/CD runners? If a build gets interrupted, it just retries. Our batch data processing jobs? They’re designed to be fault-tolerant. Our entire development and staging environments? Nobody cares if a dev pod gets rescheduled. I created a new node pool using preemptible VMs, which are 60-80% cheaper than regular instances, and moved everything that could tolerate interruptions over to it.

The first time a spot instance got preempted in production hours, I got a Slack notification and my heart skipped a beat. But then I watched as the pod seamlessly rescheduled to another node, and everything kept humming along. Over the next month, we had maybe a dozen preemptions, and not a single one caused any user-facing issues.

Step 3: I Tuned the Cluster Autoscaler

This one was more subtle but still impactful. Our cluster autoscaler was being way too conservative about scaling down. Nodes would sit there idle for hours after traffic dropped off, and we’d keep paying for them. I tweaked the configuration to be more aggressive, and I also set it to prefer spot instances when scaling up new capacity.

Step 4: I Put a Stop to Resource Sprawl

After I’d optimized our existing resources, I needed to make sure the problem wouldn’t come back. So I implemented ResourceQuota objects on all our non-production namespaces. Suddenly, when someone tried to deploy a new service requesting ridiculous amounts of memory “just to be safe,” they’d hit the quota limit and have to actually think about what they needed.

This led to some interesting conversations. A developer would ping me saying their deployment was failing, and I’d look at their config and say, “Do you really need 16Gi of memory for a service that parses JSON?” Usually the answer was no. Sometimes it was “I don’t know,” which meant we’d deploy it with reasonable limits and watch what actually happened.

Step 5: I Automated Cleanup

Here’s an embarrassing discovery: we had 847 completed jobs just sitting in our cluster. Not running, not doing anything, just taking up space in etcd and cluttering our dashboards. Some of them were six months old. I also found failed pods from deployments we’d long since deleted, just hanging around like ghosts.

I wrote a simple CronJob that runs daily and cleans up completed jobs older than 7 days and failed pods older than 3 days. It’s maybe 20 lines of kubectl commands, but it made our cluster so much cleaner and saved us a surprising amount in etcd costs.

Step 6: I Shut Down Dev Environments at Night

This one felt almost too simple to work. Our development environments were running 24/7, even though nobody was using them between 7 PM and 8 AM, or on weekends. I set up a CronJob to scale everything in our dev namespaces down to zero at 7 PM and back up at 8 AM on weekdays.

The first Monday morning after implementing this, I got a few Slack messages around 8:05 AM from developers saying their environments were slow to come up. But after a couple of weeks, people adjusted their morning routine, and the complaints stopped. This simple automation saved us about 70% on our development environment costs, which was roughly $6,000 per month.

The Real Results

After three months of focused effort, the results were staggering. Our monthly bill dropped from $45,000 to $18,000. Our average CPU utilization went from a wasteful 25% to a much healthier 65%.

My Key Takeaways

Measure first: You can’t optimize what you don’t measure. Tools like Kubecost and the Vertical Pod Autoscaler were essential for giving me the data I needed.
Right-size everything: Most pods are over-provisioned by default. This is the lowest-hanging fruit for cost savings.
Spot instances are your friend: For any workload that can handle interruptions, spot instances provide massive savings.
Automate everything: Automate cleanup and dev environment shutdowns. These small, consistent savings add up quickly.
Cost optimization is a continuous process: This isn’t a one-time project. I set up monthly reviews to ensure costs don’t creep back up.