Kubernetes — Prometheus collect NUMA information
Every reader of our blog knows in the meantime that we care about NUMA architecture and optimization. Despite the fact that the architecture itself is quite easy to understand, the daily operations struggle to optimize NUMA balancement.
It’s already hard to track the current NUMA node balancement in a pure VMware vSphere environment, but monitoring it with the additional Kubernetes/Container layer on top makes it even harder.
We’re going to cover much more NUMA and other critical resource monitoring and optimization facts in the future, but let’s start with the simplest part. How to get the NUMA details of your Kubernetes Node using the popular Prometheus Node exporter.
Here is one of the posts:
Helm and Daemonset
The typical helm charts creates a daemonset that is deploying Node exporter Pods on every Kubernetes Node.
That way you don’t need to think about deploying a Node exporter pod each time you add a new node to your cluster. The good think, every exporter pod has labels set, so Prometheus is automatically scraping the metrics once the pod is up and running.
You can easily check your Prometheus server if the node is being scraped already:
Simply access the prometheus server and check if Kubernetes-nodes are listed as targets.
Check Node Exporter metrics
To check the gathered metrics, simply type node_ and select a metric.
But when you start searching for NUMA metrics, this list will remain empty (unless you already made the change this blog post is about).
Enable NUMA collector
When you read through the official documentation, you’ll notice that some metrics are not collected by default. That is also true for all NUMA related metrics that can the typically found under /proc/meminfo_numa.
To enable the NUMA collection you would need set some arguments for the Node Exporter pods within the daemonset. To get that information, lets first search for the daemonset.
Next step is to edit the daemonset you want to change.
kubectl edit daemonset.apps/prometheus-node-exporter
Here you should find a spec section for the containers including the args — add — -collector.meminfo_numa and save the file.
spec: containers: - args: - --path.procfs=/host/proc - --path.sysfs=/host/sys - --collector.meminfo_numa
When saving the daemonset all related Pods will automatically be terminated, deleted and created with the new parameterset.
Wait some minutes and search for NUMA again within the Prometheus database and you should see some metrics coming in.
node_memory_numa_interleave_hit_total is one example.
Perfect — now we can add some meaningful charts to our dashboards. You can either use your own Grafana instance or our Performance Analyzer product that has all build in (Kubernetes integration is still beta, but the final release only some days away). Just contact us, if you want to give it a try.
- NUMA Nodes
- NUMA Interleave hit
- NUMA Hit (Number/Byte)
- NUMA Home Miss
- NUMA Foreign
- NUMA local
- NUMA other
If you want to learn a bit more about the most important NUMA metrics on Linux, this one should be a good starter:
- numa_hit: Number of pages allocated from the node the process wanted.
- numa_miss: Number of pages allocated from this node, but the process preferred another node.
- numa_foreign: Number of pages allocated another node, but the process preferred this node.
- local_node: Number of pages allocated from this node while the process was running locally.
- other_node: Number of pages allocated from this node while the process was running remotely (on another node).
- interleave_hit: Number of pages allocated successfully with the interleave strategy.
Originally published at https://www.opvizor.com on April 25, 2019.