My Guide to Autoscaling MuleSoft Runtime Fabric on Kubernetes

I wanted to run our MuleSoft applications on Kubernetes to take advantage of its scalability and efficiency. While MuleSoft Runtime Fabric (RTF) lets you do this, I quickly discovered a major limitation: it doesn’t support native autoscaling. You can only scale manually through the Anypoint Platform. This was a deal-breaker for our production environment. So, I figured out how to implement our own autoscaling using Kubernetes HorizontalPodAutoscalers (HPAs). Here’s my complete guide to setting up a production-ready, autoscaling MuleSoft environment on GKE.

My Architecture

The architecture I landed on uses a single GKE cluster with separate namespaces for each environment (dev, staging, prod). This is cost-efficient and allows for great environment separation. The MuleSoft control plane manages the deployments, but I use Kubernetes-native HPAs to handle the scaling.

graph TB
    subgraph Anypoint Platform
        AM[Anypoint Management]
        RM[Runtime Manager]
        API[API Manager]
    end

    subgraph GKE Cluster
        subgraph Dev Namespace
            DevRTF[Runtime Fabric Agent]
            DevApp1[Mule App 1]
            DevApp2[Mule App 2]
            DevHPA1[HPA]
        end

        subgraph Staging Namespace
            StgRTF[Runtime Fabric Agent]
            StgApp1[Mule App 1]
            StgHPA1[HPA]
        end

        subgraph Production Namespace
            ProdRTF[Runtime Fabric Agent]
            ProdApp1[Mule App 1]
            ProdApp2[Mule App 2]
            ProdHPA1[HPA]
            ProdHPA2[HPA]
        end
    end

    RM --> DevRTF
    RM --> StgRTF
    RM --> ProdRTF

    DevHPA1 -.->|Autoscale| DevApp1
    StgHPA1 -.->|Autoscale| StgApp1
    ProdHPA1 -.->|Autoscale| ProdApp1
    ProdHPA2 -.->|Autoscale| ProdApp2

    style Anypoint Platform fill:#00a0df,color:#fff
    style Production Namespace fill:#4caf50,color:#fff

The Setup Process

Here are the steps I follow to get everything up and running.

Step 1: Prepare the GKE Cluster

First, I prepare a GKE cluster optimized for MuleSoft workloads, making sure the HorizontalPodAutoscaling addon is enabled.

gcloud container clusters create mulesoft-cluster \
  --region=us-central1 \
  --num-nodes=3 \
  --machine-type=n2-standard-4 \
  --enable-autoscaling \
  --min-nodes=3 \
  --max-nodes=10 \
  --addons=HorizontalPodAutoscaling,HttpLoadBalancing

Step 2-3: Configure Anypoint Platform

In the Anypoint Platform, I create three environments: Development, Staging, and Production. This allows me to logically separate my deployments.

Step 4: Install Runtime Fabric

Next, I follow the official MuleSoft guide to install the RTF agent into my cluster. This involves getting an installation command from the Anypoint Platform’s Runtime Manager and running it with kubectl.

After the installation, I verify that the rtf-agent and other components are running in the rtf namespace.

Step 5: Associate RTF with Environments

This is a crucial step. In Runtime Manager, I associate my newly created RTF with the Development, Staging, and Production environments. This allows me to deploy applications to any of these targets from the same control plane.

Step 6: Deploy a Test Application

To test the setup, I deploy a simple “hello-world” application from the Anypoint Exchange into my dev environment. I configure it to run with 1 replica.

Deploying hello-world application in Mulesoft Runtime Manager

After deploying, I can test the endpoint to make sure it’s working.

Testing Mulesoft application endpoint with Advanced Rest Client

Step 7: Implementing the HPA for Autoscaling

Now for the most critical part: implementing the autoscaling that MuleSoft RTF lacks. I do this by creating a HorizontalPodAutoscaler (HPA) resource for each Mule application I deploy.

To create the HPA, I need two key pieces of information:

The Environment ID from Anypoint Platform, which RTF uses as the Kubernetes namespace.
The Application Name, which becomes the name of the Deployment resource created by RTF.

I then create a YAML manifest for the HPA. In the spec, I point scaleTargetRef to the application’s Deployment. I configure it to scale based on CPU and memory utilization—for example, scaling up when CPU exceeds 70%.

# hpa-hello-world.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hello-world-dev
  namespace: <environment-id>  # The ID from Anypoint Platform
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hello-world-dev      # The application name
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
    scaleUp:
      stabilizationWindowSeconds: 0   # Scale up immediately

I apply this manifest with kubectl apply -f hpa-hello-world.yaml and can then watch the HPA manage my application’s replicas.

Step 8: Testing the Autoscaling

To see it in action, I generate load against the application endpoint using a tool like hey.

hey -z 5m -c 100 http://<service-endpoint>/hello

As the CPU load increases, I can watch the HPA automatically create new pods to handle the traffic by running kubectl get hpa -n <environment-id> -w.

My Production Strategy

To make this repeatable, I wrote a shell script that generates and applies the HPA manifest for any given application. This allows me to easily enforce different scaling rules for different environments—for example, allowing dev to scale to zero, but keeping a minimum of 3 replicas in production.

# Example usage of my script
./deploy-mule-app-with-hpa.sh api-orders dev-env-id 1 5
./deploy-mule-app-with-hpa.sh api-orders stg-env-id 1 10
./deploy-mule-app-with-hpa.sh api-orders prod-env-id 3 50

Final Thoughts

Running MuleSoft on Kubernetes has been a game-changer for us, but only after I solved the autoscaling challenge with HPAs. My key takeaway is that for every app I deploy via the Anypoint UI, I need a corresponding HPA managed via GitOps. It’s crucial to set proper resource requests in Anypoint and then tune the HPA thresholds based on real-world load. This hybrid approach—deploying from MuleSoft’s control plane but managing scaling with Kubernetes-native tools—gives me the best of both worlds: centralized management and production-grade, automated scalability.

GKE Cluster Operations - Provision GKE for MuleSoft
Kafka on Kubernetes - Integrate with Kafka
Kubernetes Dashboard Access - Monitor MuleSoft apps