Skip to content

Node Exporter Coverage

WI-330, WI-365, WI-408 | ADR-0038 | Sprint 3

Summary

IT and OT nodes emit metrics to the Prometheus instance in archon-monitoring. Four IT nodes run the k3s DaemonSet; caneast-site1-node1 and caneast-site1-ot2-cam01 use Ansible systemd; CanEast AI Node scrapes are deferred (WI-385, personal workstation, Phase 5+).

Node Zone Exporter Method Port Prometheus job
caneast-site1-node2 IT node_exporter k3s DaemonSet 9100 node-exporter
caneast-site1-node3 IT node_exporter k3s DaemonSet 9100 node-exporter
caneast-site1-node4 IT node_exporter k3s DaemonSet 9100 node-exporter
caneast-site1-node5 IT node_exporter k3s DaemonSet 9100 node-exporter
caneast-site1-node1 IT node_exporter Ansible systemd 9100 node-exporter-static
caneast-site1-ot2-cam01 OT-2 node_exporter Ansible systemd 9100 node-exporter-cam01
alienware (Windows) IT (deferred) windows_exporter Manual MSI 9100 alienware-host (removed WI-385)
alienware (WSL) IT (deferred) node_exporter Manual systemd 9101 alienware-wsl (removed WI-385)

k3s Nodes (DaemonSet)

The kube-prometheus-stack Helm chart deploys prometheus-node-exporter as a DaemonSet on all k3s nodes (caneast-site1-node2/3/4/5). No additional configuration is required.

kubectl get ds -n archon-monitoring kube-prometheus-stack-prometheus-node-exporter

caneast-site1-node1 (RPi4, arm64)

Managed by the node_exporter Ansible role in archon-platform. The role auto-detects architecture (aarch64 -> arm64, x86_64 -> amd64).

ANSIBLE_PRIVATE_KEY_FILE=~/.ssh/ansible-svc-account \
  .venv/bin/ansible-playbook \
  ansible/playbooks/it/node-exporter.yml \
  -i ansible/inventories/it/hosts.yml

caneast-site1-node1 uses SSH port 22 (not 2222) and is in the standalone_nodes inventory group. Version is pinned in ansible/roles/node_exporter/defaults/main.yml.

CanEast AI Node two-layer monitoring (WI-365)

CanEast AI Node runs a Windows host and a WSL subsystem. Separate exporters cover both layers so Windows OS metrics and Linux dev-environment metrics are independently queryable.

Layer Exporter Port Job Extra labels
Windows host windows_exporter v0.30.5 9100 alienware-host host=alienware, layer=windows
WSL subsystem node_exporter v1.8.2 9101 alienware-wsl host=alienware-wsl, layer=wsl, parent_host=alienware

WSL port 9101 is forwarded to REDACTED:[REDACTED] via netsh interface portproxy on the Windows host, with a scheduled task that refreshes the rule on restart (WSL IP changes).

Full deployment steps: CanEast AI Node Monitoring Setup runbook

Prometheus scrape config

Non-k3s and CanEast AI Node targets are in additionalScrapeConfigs under prometheus.prometheusSpec in kubernetes/archon-monitoring/kube-prometheus-stack/values.yaml:

additionalScrapeConfigs:
  - job_name: node-exporter-static
    static_configs:
      - targets:
          - REDACTED:[REDACTED]  # caneast-site1-node1
  - job_name: alienware-host
    static_configs:
      - targets:
          - REDACTED:[REDACTED]
        labels:
          host: alienware
          layer: windows
  - job_name: alienware-wsl
    static_configs:
      - targets:
          - REDACTED:[REDACTED]
        labels:
          host: alienware-wsl
          layer: wsl
          parent_host: alienware
  - job_name: cadvisor-caneast-site1-node2
    static_configs:
      - targets:
          - REDACTED:[REDACTED]

OT Nodes (Ansible systemd)

caneast-site1-ot2-cam01 (Pi 5, OT zone ot-zone) -- WI-408

Managed by the node_exporter Ansible role in archon-platform, deployed via ansible/playbooks/ot/cam01-monitoring.yml. Targets the ot-zone inventory group.

.venv/bin/ansible-playbook \
  ansible/playbooks/ot/cam01-monitoring.yml \
  -i ansible/inventories/ot/hosts.yml

Metrics collected: CPU, RAM, disk, thermal, network. No textfile collector. Scrape interval: default (60s). Job name: node-exporter-cam01.

Prometheus scrape config (cam01 entry)

In kubernetes/archon-monitoring/kube-prometheus-stack/values.yaml:

- job_name: node-exporter-cam01
  static_configs:
    - targets:
        - REDACTED:[REDACTED]  # caneast-site1-ot2-cam01 (Pi 5, OT zone ot-zone)
  relabel_configs:
    - source_labels: [__address__]
      target_label: instance
      replacement: caneast-site1-ot2-cam01

Verification

# All k3s DaemonSet nodes
up{job="node-exporter"}

# caneast-site1-node1
up{job="node-exporter-static", instance="caneast-site1-node1"}

# caneast-site1-ot2-cam01 (Pi 5)
up{job="node-exporter-cam01", instance="caneast-site1-ot2-cam01"}
node_cpu_seconds_total{instance="caneast-site1-ot2-cam01"}
node_memory_MemAvailable_bytes{instance="caneast-site1-ot2-cam01"}
node_filesystem_avail_bytes{instance="caneast-site1-ot2-cam01"}

Prometheus Targets page: filter by node-exporter-cam01 to confirm target shows UP after cam01-monitoring.yml playbook run and Helm upgrade to apply the new scrape config.