Skip to content

cAdvisor on caneast-site1-node2

WI-331 | ADR-0038 | Sprint 3

Overview

cAdvisor is deployed on caneast-site1-node2 as a Docker container to expose container-level metrics for the legacy Docker workload stack running outside of k3s.

caneast-site1-node2 runs both: - k3s worker node workloads (covered by kubelet cAdvisor scrape via kube-prometheus-stack) - Legacy Docker workloads (Portainer, Atlas CMMS, InfluxDB, Grafana OT, etc.) — covered by this container

Deployment

docker run -d \
  --name cadvisor \
  --restart=unless-stopped \
  -p 18080:[REDACTED] \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  --device=/dev/kmsg \
  gcr.io/cadvisor/cadvisor:v0.49.1

Port 18080 (not 8080): port 8080 is occupied by the Atlas CMMS backend.

Prometheus Scrape Config

Configured as an additionalScrapeConfig in kubernetes/archon-monitoring/kube-prometheus-stack/values.yaml:

- job_name: cadvisor-caneast-site1-node2
  static_configs:
    - targets:
        - REDACTED:[REDACTED]
  relabel_configs:
    - source_labels: [__address__]
      target_label: instance
      replacement: caneast-site1-node2

Verification

# Health check from caneast-site1-node2
curl http://localhost:[REDACTED]/healthz
# → ok

# Metrics endpoint
curl -s http://localhost:[REDACTED]/metrics | head -5

From Prometheus:

REDACTED{instance="caneast-site1-node2"}

Notes

  • Image pinned at v0.49.1 — update deliberately with testing
  • Container is restarted automatically (--restart=unless-stopped)
  • /dev/kmsg device pass-through is required for memory OOM metrics
  • k3s worker cAdvisor is scraped separately via the kubelet ServiceMonitor included in kube-prometheus-stack