How I Build Resilient Services with Traefik Circuit Breakers and Rate Limiting

Feb 16, 2024

I learned the importance of resilience the hard way, when a single failing downstream service caused a cascade of failures that took down our entire application. That’s when I realized I needed to protect my services from both internal failures and external traffic surges. I found the solution in Traefik Proxy’s middleware, which lets me implement circuit breakers and rate limiting without changing a single line of application code. Here’s how I do it.

My Resilience Architecture

My approach is to have Traefik act as a smart gatekeeper. All traffic comes through a Traefik Ingress, which then passes it through a chain of middleware before it ever hits my application. The circuit breaker watches for a high rate of errors, and if it sees one, it “opens” and starts failing fast, returning a 503 Service Unavailable without even trying to contact my app. The rate limiter, on the other hand, watches for too many requests and returns a 429 Too Many Requests if a client gets too aggressive.

graph TB
    Internet[Internet Traffic] --> Ingress[Traefik Ingress]
    Ingress --> Middleware{My Middleware Chain}

    Middleware --> CircuitBreaker[Circuit Breaker]
    Middleware --> RateLimit[Rate Limiter]

    CircuitBreaker --> |Normal| App[Application Pod]
    CircuitBreaker --> |High Errors| Fallback[Fail Fast: 503]
    RateLimit --> |OK| App
    RateLimit --> |Too Many| TooManyRequests[Shed Load: 429]

    style CircuitBreaker fill:#ffcccc,stroke:#cc0000
    style RateLimit fill:#ccf,stroke:#00c

My Implementation Workflow

Here’s the process I follow to set this up.

Step 1: Deploy Traefik and a Sample App

First, I get Traefik running in my cluster using its Helm chart. Then, I deploy a sample application to protect. I often use httpbin for this because it lets me easily simulate error responses.

Step 2: Configure the Circuit Breaker

Next, I create the first layer of protection: the circuit breaker. I create a Middleware custom resource in Kubernetes. My go-to expression is ResponseCodeRatio(500, 600, 0, 600) > 0.25. This tells Traefik to open the circuit if more than 25% of requests in a time window result in a 5xx error. When the circuit is open, Traefik immediately fails subsequent requests, protecting my app from being hammered while it’s struggling.

# circuitbreaker-middleware.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: circuitbreaker
  namespace: default
spec:
  circuitBreaker:
    expression: ResponseCodeRatio(500, 600, 0, 600) > 0.25

Step 3: Configure the Rate Limiter

The second layer is rate limiting. I create another Middleware resource, this time with a rateLimit spec. I usually start with an average of 100 requests per second.

# ratelimit-middleware.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: ratelimit
  namespace: default
spec:
  rateLimit:
    average: 100
    # burst: 50 # I can add this to allow temporary bursts

Step 4: Apply the Middleware to the Ingress

The final step is to apply these protections. I do this by adding an annotation to my application’s Ingress resource, pointing to the middleware I created. The format is namespace-middlewarename@kubernetescrd, and I can chain multiple middleware together.

# ingress-with-middleware.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app1-ingress
  annotations:
    traefik.ingress.kubernetes.io/router.middlewares: default-circuitbreaker@kubernetescrd,default-ratelimit@kubernetescrd
spec:
  ingressClassName: traefik
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app1
            port:
              number: 80

Step 5: Test Everything

I never deploy something like this without testing it. I use a load testing tool like hey to simulate a failing backend by hitting an endpoint that always returns a 500 error.

hey -z 10s -c 5 -q 5 http://app.example.com/status/500

I can see in the results that after a certain point, I stop getting the original error and start getting 503 Service Unavailable responses—that’s the circuit breaker doing its job!

My Final Takeaways

My key takeaway is that these two middleware are essential for any production service. The circuit breaker protects you from internal cascading failures, and the rate limiter protects you from external traffic spikes. I always start with conservative thresholds and then tune them based on real-world metrics from Traefik’s Prometheus endpoint. By using Traefik to handle this at the edge, my application code can stay focused on business logic, while the ingress layer handles resilience. It’s a powerful separation of concerns that has made my services much more stable.

El Muhammad's Portfolio

© 2025 Aria

Instagram YouTube TikTok 𝕏 GitHub