Node Exporter Coverage¶
WI-330, WI-365, WI-408 | ADR-0038 | Sprint 3
Summary¶
IT and OT nodes emit metrics to the Prometheus instance in archon-monitoring.
Four IT nodes run the k3s DaemonSet; caneast-site1-node1 and caneast-site1-ot2-cam01 use Ansible systemd;
CanEast AI Node scrapes are deferred (WI-385, personal workstation, Phase 5+).
| Node | Zone | Exporter | Method | Port | Prometheus job |
|---|---|---|---|---|---|
| caneast-site1-node2 | IT | node_exporter | k3s DaemonSet | 9100 | node-exporter |
| caneast-site1-node3 | IT | node_exporter | k3s DaemonSet | 9100 | node-exporter |
| caneast-site1-node4 | IT | node_exporter | k3s DaemonSet | 9100 | node-exporter |
| caneast-site1-node5 | IT | node_exporter | k3s DaemonSet | 9100 | node-exporter |
| caneast-site1-node1 | IT | node_exporter | Ansible systemd | 9100 | node-exporter-static |
| caneast-site1-ot2-cam01 | OT-2 | node_exporter | Ansible systemd | 9100 | node-exporter-cam01 |
| alienware (Windows) | IT (deferred) | windows_exporter | Manual MSI | 9100 | alienware-host (removed WI-385) |
| alienware (WSL) | IT (deferred) | node_exporter | Manual systemd | 9101 | alienware-wsl (removed WI-385) |
k3s Nodes (DaemonSet)¶
The kube-prometheus-stack Helm chart deploys prometheus-node-exporter as a DaemonSet
on all k3s nodes (caneast-site1-node2/3/4/5). No additional configuration is required.
caneast-site1-node1 (RPi4, arm64)¶
Managed by the node_exporter Ansible role in archon-platform. The role auto-detects
architecture (aarch64 -> arm64, x86_64 -> amd64).
ANSIBLE_PRIVATE_KEY_FILE=~/.ssh/ansible-svc-account \
.venv/bin/ansible-playbook \
ansible/playbooks/it/node-exporter.yml \
-i ansible/inventories/it/hosts.yml
caneast-site1-node1 uses SSH port 22 (not 2222) and is in the standalone_nodes inventory group.
Version is pinned in ansible/roles/node_exporter/defaults/main.yml.
CanEast AI Node two-layer monitoring (WI-365)¶
CanEast AI Node runs a Windows host and a WSL subsystem. Separate exporters cover both layers so Windows OS metrics and Linux dev-environment metrics are independently queryable.
| Layer | Exporter | Port | Job | Extra labels |
|---|---|---|---|---|
| Windows host | windows_exporter v0.30.5 | 9100 | alienware-host |
host=alienware, layer=windows |
| WSL subsystem | node_exporter v1.8.2 | 9101 | alienware-wsl |
host=alienware-wsl, layer=wsl, parent_host=alienware |
WSL port 9101 is forwarded to REDACTED:[REDACTED] via netsh interface portproxy on the
Windows host, with a scheduled task that refreshes the rule on restart (WSL IP changes).
Full deployment steps: CanEast AI Node Monitoring Setup runbook
Prometheus scrape config¶
Non-k3s and CanEast AI Node targets are in additionalScrapeConfigs under
prometheus.prometheusSpec in kubernetes/archon-monitoring/kube-prometheus-stack/values.yaml:
additionalScrapeConfigs:
- job_name: node-exporter-static
static_configs:
- targets:
- REDACTED:[REDACTED] # caneast-site1-node1
- job_name: alienware-host
static_configs:
- targets:
- REDACTED:[REDACTED]
labels:
host: alienware
layer: windows
- job_name: alienware-wsl
static_configs:
- targets:
- REDACTED:[REDACTED]
labels:
host: alienware-wsl
layer: wsl
parent_host: alienware
- job_name: cadvisor-caneast-site1-node2
static_configs:
- targets:
- REDACTED:[REDACTED]
OT Nodes (Ansible systemd)¶
caneast-site1-ot2-cam01 (Pi 5, OT zone ot-zone) -- WI-408¶
Managed by the node_exporter Ansible role in archon-platform, deployed via
ansible/playbooks/ot/cam01-monitoring.yml. Targets the ot-zone inventory group.
.venv/bin/ansible-playbook \
ansible/playbooks/ot/cam01-monitoring.yml \
-i ansible/inventories/ot/hosts.yml
Metrics collected: CPU, RAM, disk, thermal, network. No textfile collector.
Scrape interval: default (60s). Job name: node-exporter-cam01.
Prometheus scrape config (cam01 entry)¶
In kubernetes/archon-monitoring/kube-prometheus-stack/values.yaml:
- job_name: node-exporter-cam01
static_configs:
- targets:
- REDACTED:[REDACTED] # caneast-site1-ot2-cam01 (Pi 5, OT zone ot-zone)
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: caneast-site1-ot2-cam01
Verification¶
# All k3s DaemonSet nodes
up{job="node-exporter"}
# caneast-site1-node1
up{job="node-exporter-static", instance="caneast-site1-node1"}
# caneast-site1-ot2-cam01 (Pi 5)
up{job="node-exporter-cam01", instance="caneast-site1-ot2-cam01"}
node_cpu_seconds_total{instance="caneast-site1-ot2-cam01"}
node_memory_MemAvailable_bytes{instance="caneast-site1-ot2-cam01"}
node_filesystem_avail_bytes{instance="caneast-site1-ot2-cam01"}
Prometheus Targets page: filter by node-exporter-cam01 to confirm target shows UP after
cam01-monitoring.yml playbook run and Helm upgrade to apply the new scrape config.