A Practical Guide to A/B Testing on Kubernetes with Traefik Mesh

I had a new version of a feature that I thought was much better, but the product team wasn’t sure how users would react. Actually, let me be honest: they were pretty skeptical. I’d spent two weeks refactoring the entire user dashboard, convinced my new design was cleaner and more intuitive. But when I showed it to the product manager, her first words were, “I don’t know… the current version works fine.”

That’s when we realized we needed data, not opinions. The problem was, I’d never implemented A/B testing on Kubernetes before. I knew about service meshes in theory, but had never actually deployed one. After looking at options like Istio (which felt like bringing a tank to a knife fight for our small team), I landed on Traefik Mesh. It promised simplicity, and after a week of late-night tinkering and more than a few “why isn’t this working” moments, I got it running. Here’s what I learned.

The Architecture I Settled On

The beauty of this setup is its simplicity. Traefik Mesh sits between our users and our services, inspecting every incoming request. When it sees a specific header (I chose the User-Agent for Firefox users, mostly because I use Firefox myself and wanted to dogfood the new version), it routes that traffic to my new “B” service. For everyone else, traffic gets load-balanced across both versions.

Why not just do a 50/50 split? I wanted more control. This approach meant I could target specific user segments, and if things went sideways, I wasn’t exposing half my user base to a potentially broken experience.

graph LR
    Users[Users] --> Gateway[Nginx Gateway]
    Gateway --> TraefikMesh[Traefik Mesh]

    TraefikMesh --> |No Header| ServiceA[Service A]
    TraefikMesh --> |No Header| ServiceB[Service B]
    TraefikMesh --> |Firefox Header| ServiceB

    style TraefikMesh fill:#f9f,stroke:#333

Getting It Running

Let me walk you through how I actually set this up. Fair warning: the first time I did this, I messed up the namespace configuration and spent an embarrassing amount of time debugging why my pods couldn’t talk to each other.

Step 1: Install Traefik Mesh

The installation itself is straightforward. I used Helm because, honestly, who wants to manually apply dozens of YAML files?

helm repo add traefik-mesh https://helm.traefik.io/mesh
helm install traefik-mesh traefik-mesh/traefik-mesh -n traefik-mesh --create-namespace

Step 2: Deploy My Application Versions

This part was easy since I already had my deployments ready. I just needed to create two separate Kubernetes Service resources, one pointing to version “A” and one to version “B”. Both versions run simultaneously, which feels weird at first but is exactly what we need.

Step 3: Define the Routing Logic with SMI

Here’s where things got interesting. Traefik Mesh uses the Service Mesh Interface (SMI) specification, which I hadn’t worked with before. The SMI specs are pretty elegant once you understand them. I needed to create an HTTPRouteGroup resource that tells Traefik Mesh which traffic to route where.

In my case, I wanted to catch all Firefox users. The regex pattern for matching the User-Agent header was tricky (my first attempt caught some weird bot traffic), but eventually I landed on this:

# http-route.yaml
apiVersion: specs.smi-spec.io/v1alpha3
kind: HTTPRouteGroup
metadata:
  name: http-everything
  namespace: demo-ab
spec:
  matches:
  - name: firefox-users
    headers:
    - user-agent: ".*Firefox.*"

With the matching rule defined, I created a TrafficSplit resource. This is where the magic happens. The TrafficSplit lets you define exact percentages of traffic going to each backend service. I set it to send 100% of the Firefox traffic to service B, while everyone else gets distributed between both services.

The first time I deployed this, I accidentally set the weight wrong and sent ALL traffic to the new version. My heart stopped when I saw the error rate spike. Always double-check your weights before applying.

# traffic-split.yaml
apiVersion: split.smi-spec.io/v1alpha3
kind: TrafficSplit
metadata:
  name: server-split
  namespace: demo-ab
spec:
  service: svc-ab # The root service
  backends:
  - service: svc-a
    weight: 0
  - service: svc-b
    weight: 100
  matches:
  - kind: HTTPRouteGroup
    name: http-everything

Step 4: Test the Routing

Before letting this anywhere near production, I needed to validate the routing actually worked. I’m a big believer in testing everything twice, especially traffic routing.

When I sent a standard curl request without any special headers, the responses alternated between versions A and B. But when I added the Firefox User-Agent header, every single response came back from service B. Not 99%. Not “mostly”. Every. Single. One.

I must have run this test about fifty times, partly because I was paranoid, partly because it was genuinely satisfying to watch it work.

# Test with Firefox User-Agent
curl -s -H "user-agent: Mozilla/5.0 (Firefox)" $GATEWAY_IP

# Test with default User-Agent
curl -s $GATEWAY_IP

What Happened Next

Once I had this running in production, things got interesting fast. Within days, we had data showing the new version had a 23% higher engagement rate among Firefox users. The product manager, who had been skeptical from the start, suddenly became my biggest advocate. “Can we expand this to Chrome users?” she asked.

But the real win wasn’t just proving my redesign worked. It was having a reliable way to test anything. Want to try a new payment flow? Route 5% of users and measure conversion rates. Curious if that cache strategy improves performance? Split traffic and compare response times. This opened up a new way of making product decisions based on real data instead of gut feelings.

The gradual rollout capability has saved me more than once. I can adjust traffic weights incrementally. Start at 10%, watch the metrics, bump it to 25%, keep watching. If something breaks, dial it back down.

What I’d Tell My Past Self

Looking back, there are a few things I wish I’d known. First, don’t overthink the service mesh choice. I spent way too long evaluating options when Traefik Mesh’s simplicity was exactly what I needed. Sometimes the best tool is the one you can actually understand and debug at 2 AM.

Second, the SMI specification is your friend. At first, all those Kubernetes custom resources felt overwhelming, but once I understood that HTTPRouteGroup defines the “what” and TrafficSplit defines the “where,” everything clicked.

Third, you don’t need to change application code. The routing happens entirely at the infrastructure level. Your application has no idea it’s part of an A/B test, which means you can apply this pattern to any service, even ones you didn’t write.

Finally, the ability to adjust traffic weights dynamically is more powerful than I initially realized. It’s not just for A/B testing. It’s for canary deployments, blue-green rollouts, performance comparisons, and any scenario where you need control over which users see which version.