Kubernetes Workers Autoscaling based on RabbitMQ queue size

Kubernetes Horizontal Pod Autoscalers (HPA) definitely can help you to save a lot of money. Basic setup of HPA based on CPU utilization you can launch pretty easy, but what to do if you want to scale based on external service or external metrics? This article will help you to setup it properly and scale up just in case of real needs in GKE environment. Following setup will work well from kubernetes 1.10 however it may have some issues with metric labels in versions below 1.12

1. Deploy Custom Metrics Stackdriver adapter

kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user "$(gcloud config get-value account)"
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml

2. Prometheus and Stackdriver exporters

Now we need to deploy RabbitMQ metrics exporter using kbudde/rabbitmq-exporter container, it will serve RabbitMQ metrics using prometheus and prometheus-to-sd which automatically export prometheus metrics to stackdriver metrics. We can merge both into one deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: amqp-metrics
spec:
replicas: 1
template:
metadata:
labels:
app: amqp-metrics
spec:
containers:
- name: prometheus
image: kbudde/rabbitmq-exporter:v0.29.0
env:
- name: RABBIT_URL
value: http://SOMERABBITMQHOST:15672
- name: RABBIT_USER
value: SOMEUSERNAME
- name: RABBIT_PASSWORD
value: SOMEPASSWORD
- name: PUBLISH_PORT
value: "9419"
# amqp 3.6.9++
- name: RABBIT_CAPABILITIES
value: "bert,no_sort"
resources
:
requests:
cpu: 100m
memory: 100Mi
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.2.3
command:
- /monitor
- --source=:http://localhost:9419
- --stackdriver-prefix=custom.googleapis.com
- --pod-id=$(POD_ID)
- --namespace-id=$(POD_NAMESPACE)
env:
- name: POD_ID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 100m
memory: 100Mi

3. Verify exported metrics and deploy HPA

Finally we are able to see our metrics inside google stackdriver metrics explorer

Now we are ready to setup autoscaler using exported metrics:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: workers-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: my-workers
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
metricSelector
:
matchLabels:
metric.labels.queue: myqueue
targetValue: 20

You can find all available metrics for RabbitMQ here https://github.com/kbudde/rabbitmq_exporter, also you can use following labels as filter: cluster, vhost, queue, durable, policy, self (may not work properly in 1.10 and some of 1.11)

Now our deployment my-workers will grow if RabbitMQ queue myqueue has more than 20 non-processed jobs in total

Using metricSelector you can make different autoscalers for different queues and worker deployments

4. Verify HPA and deployment

To verify our HPA we can use kubectl describe hpa command

kubectl describe hpa workers-hpa
Name: workers-hpa
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"workers-hpa","namespace":"defaul...
CreationTimestamp: Thu, 20 Dec 2018 12:20:26 -0500
Reference: Deployment/my-workers
Metrics: ( current / target )
"custom.googleapis.com|rabbitmq_queue_messages_ready" (target value): 0 / 20
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric custom.googleapis.com|rabbitmq_queue_messages_ready(&LabelSelector{MatchLabels:map[string]string{metric.labels.queue: viral_marketing,resource.labels.cluster_name: apps-cloud,},MatchExpressions:[],})
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range

Conclusion

External metrics open a way to boost your cluster and application efficiency to new level. Try to experiment with other metrics or make your own. Thank you for reading