Practical Guide to Kubernetes Scaling #2 Nodes
We check out and configure Node Autoscaling on Gcloud with a practical example, checking metrics and numbers.
- Part 1: Metrics and Pod Auto Scaling
- Part 2: this article
Pod Scaling to the limits
In the first part we used a simple but very cpu-hungry Python app and deployed it with Kubernetes. We then checked the metrics and setup Pod Autoscaling. At the end of the part 1 we reached the node-pools cpu limits, but the Pod Autoscaler did still queue up new pods:
which resulted in a lot of Pods in Pending states:
Let’s see if we can handle this using the GKE Kubernetes Node Autoscaling.
Get that good cluster
We use Gcloud as example, but the general handling behind Node Scaling is the Kubernetes Cluster Autoscaler
gcloud container clusters create test --zone europe-west4-a --enable-autoscaling --num-nodes 3 --max-nodes 4
We define a cluster with 3 nodes (GKE default) and say the autoscaler can scale to a maximum of 4.
Be aware that you can quite easily reach your zone limits, especially if you’re on the Free Trial membership. To go around this you can also specify your cluster across multiple node zones, example here.
Deploy the cpu hungry python app
Much like in part 1:
git clone email@example.com:wuestkamp/kubernetes-scale-that-app.git
git checkout part2 # change branch
kubectl kustomize build i | kubectl apply -f -
Now we watch the node and pod metrics:
watch “kubectl top node && kubectl top pod”
Check Pod/Container Limits
In branch part2 there should already be limits for the container specified in deployment.yaml, check if those are deployed with:
kubectl describe pod app-57476d48cf-2d9fw | grep Limits -A3
Create Horizontal Pod Autoscaler
kubectl autoscale deployment app --cpu-percent=50 --min=3 --max=10
kubectl get hpa
This will scale the number of replicas up or down depending on load, just like in part1.
See Pod and Node autoscaler in action
We hit the app once:
curl http://22.214.171.124:5000 # use your LoadBalancer IP
In response the Pod Autoscaler spun up more pods, but one is in Pending state:
In response the Node Autoscaler did create another Node and all pods can be scheduled
After the process has completed the Pod Autoscaler will scale down which in turn will make the Node Autoscaler to scale down again.
When does Cluster Autoscaler change the size of a cluster?
Cluster Autoscaler increases the size of the cluster when:
- there are pods that failed to schedule on any of the current nodes due to insufficient resources.
- adding a node similar to the nodes currently present in the cluster would help.
Cluster Autoscaler decreases the size of the cluster when some nodes are consistently unneeded for a significant amount of time. A node is unneeded when it has low utilization and all of its important pods can be moved elsewhere. (source)
Considering Pod scheduling and disruption
When scaling down, cluster autoscaler respects scheduling and eviction rules set on Pods. These restrictions can prevent a node from being deleted by the autoscaler. A node’s deletion could be prevented if it contains a Pod with any of these conditions:
- The Pod’s affinity or anti-affinity rules prevent rescheduling.
- The Pod has local storage.
- The Pod is not managed by a Controller such as a Deployment, StatefulSet, Job or ReplicaSet. (source)
Nodes not scaling down?
When scaling down, cluster autoscaler honors a graceful termination period of 10 minutes for rescheduling the node’s Pods onto a different node before forcibly terminating the node.
Occasionally, cluster autoscaler cannot scale down completely and an extra node exists after scaling down. This can occur when required system Pods are scheduled onto different nodes, because there is no trigger for any of those Pods to be moved to a different node. (source)
How does this all work btw?
Checkout the Kubernetes Cluster Autoscaler FAQ:
In this article series we had a hands-on experience seeing the Pod and Node autoscaler in action together. Pretty smooth.