Zero-Downtime Releases with My Blue-Green Deployment Playbook Using Traefik

For my most critical applications, a failed deployment is not an option. A rolling update can be risky because you have mixed versions running, and a rollback can be slow. That’s why for high-stakes releases, I use the blue-green deployment strategy. It gives me zero-downtime releases with the safety net of an instant, one-second rollback. I use Traefik Proxy’s weighted routing to make this process incredibly simple. Here’s my playbook.

Why I Choose Blue-Green

I choose blue-green when I absolutely cannot have mixed versions of my application running at the same time, and when an instant rollback is a must-have. It does require having double the infrastructure running for a short period, but for critical services, that temporary cost is a small price to pay for the massive reduction in risk.

My architecture for this is straightforward. I have two identical deployments, which I call “blue” (the current version) and “green” (the new version). The magic happens in a TraefikService custom resource, which acts as a weighted load balancer, directing traffic based on the weights I set.

graph LR
    Users[Users] --> Ingress[IngressRoute]
    Ingress --> TraefikSvc[TraefikService<br/>Weighted Routing]

    TraefikSvc -->|Weight: 100| Blue[Blue Service v6.1.5]
    TraefikSvc -->|Weight: 0| Green[Green Service v6.1.6]

    style Blue fill:#2196f3,color:#fff
    style Green fill:#4caf50,color:#fff

My Deployment Workflow

Here is the step-by-step process I follow for every blue-green release.

Step 1: Deploy Both Environments

First, I make sure the current version, “blue,” is running. Then, I deploy the new version, “green,” into the same namespace. At this point, the green deployment is running, but it’s not receiving any live traffic.

# blue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: blue
spec:
  template:
    spec:
      containers:
      - name: podinfo
        image: ghcr.io/stefanprodan/podinfo:6.1.5
---
# green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: green
spec:
  template:
    spec:
      containers:
      - name: podinfo
        image: ghcr.io/stefanprodan/podinfo:6.1.6 # New version

Step 2: Configure Initial Traffic Routing

Next, I create the TraefikService resource. Initially, I set the weights so that blue gets 100% of the traffic and green gets 0%. This directs all live traffic to the current, stable version.

# traefik-service.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: TraefikService
metadata:
  name: bluegreen-wrr
spec:
  weighted:
    services:
    - name: blue
      port: 80
      weight: 100 # 100% of traffic to blue
    - name: green
      port: 80
      weight: 0   # 0% of traffic to green

I then point my main IngressRoute to this TraefikService.

Step 3: Validate the Green Environment

This is a critical step. Before I make the switch, I thoroughly validate the green environment. I port-forward directly to the green service to run manual tests, and I have automated smoke tests that run against the internal service name (http://green.demo-bluegreen/). I don’t proceed until I’m confident the new version is healthy.

Step 4: The Instant Switch

Now for the switch. This is the simplest and most powerful part. I update the TraefikService manifest, changing the weights to blue: 0 and green: 100, and run kubectl apply.

# traefik-service-green.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: TraefikService
metadata:
  name: bluegreen-wrr
spec:
  weighted:
    services:
    - name: blue
      port: 80
      weight: 0    # 0% traffic to blue
    - name: green
      port: 80
      weight: 100  # 100% of traffic to green

The switch is instantaneous. All new traffic immediately starts flowing to the green environment.

Step 5: Monitor and Rollback (If Needed)

After the switch, I closely monitor the green environment’s logs and metrics. If I see any problems, I can instantly roll back by reapplying the original manifest with blue: 100 and green: 0. The rollback is just as fast as the switch.

Step 6: Decommission the Blue Environment

Once I’m confident the green version is stable (usually after a day or so), I scale the blue deployment down to zero replicas to save resources.

kubectl scale deployment blue --replicas=0 -n demo-bluegreen

My Final Thoughts

My key takeaway is that blue-green deployments, powered by Traefik’s weighted routing, are the safest way I’ve found to release critical changes. The ability to have an instant rollback provides an incredible safety net. The most important best practice I follow is to have a rigorous validation process for the green environment before the switch. I also ensure any database changes are backward-compatible so both the blue and green versions can coexist. This strategy has eliminated the stress of our most critical production releases.

A/B Testing with Traefik Mesh - Header-based traffic routing
Canary Deployment with Flagger - Automated canary analysis
Blue-Green Deployment - Instant traffic switching