Skip to content

CanEast AI Node Monitoring Setup

Two-layer monitoring for the CanEast AI Node dev workstation (REDACTED).

  • windows_exporter on the Windows host at port 9100 (Prometheus job: alienware-host)
  • node_exporter in the WSL subsystem at port 9101 (Prometheus job: alienware-wsl)

WSL port 9101 is forwarded to the Windows host via netsh interface portproxy so Prometheus scrapes both exporters at the same IP. A scheduled task refreshes the portproxy rule at startup because the WSL IP changes on restart.

WI: WI-365 (child of WI-330)
ADR: ADR-0038
Coverage doc: Node Exporter Coverage


Part 1: Enable WSL systemd

Run in a WSL terminal (sudo required):

sudo tee /etc/wsl.conf << 'EOF'
[boot]
systemd=true
command = service cron start
EOF

Then from Windows (PowerShell or Command Prompt):

wsl --shutdown

Wait a few seconds, then open a new WSL terminal and verify:

ps -p 1 -o comm=
# expected: systemd

Part 2: Install node_exporter in WSL on port 9101

Run in a WSL terminal after the systemd restart:

# Install binary to system path
sudo cp /home/operator/.local/bin/node_exporter /usr/local/bin/node_exporter
sudo chown root:root /usr/local/bin/node_exporter
sudo chmod 755 /usr/local/bin/node_exporter

# Create dedicated user
sudo useradd --system --no-create-home --shell /bin/false node_exporter

# Create systemd unit
sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Prometheus Node Exporter (WSL subsystem)
After=network.target

[Service]
Type=simple
User=node_exporter
Group=node_exporter
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:[REDACTED]
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter

Verify:

curl -s http://localhost:[REDACTED]/metrics | head -5

Part 3: Install windows_exporter on the Windows host

Run in PowerShell as Administrator:

$ver = "0.30.5"
$url = "https://github.com/prometheus-community/windows_exporter/releases/download/v$ver/windows_exporter-$ver-amd64.msi"
$dest = "$env:TEMP\windows_exporter-$ver-amd64.msi"
Invoke-WebRequest -Uri $url -OutFile $dest

msiexec /i $dest /qn LISTEN_PORT=9100 `
  ENABLED_COLLECTORS="cpu,cs,logical_disk,net,os,service,system,memory"

New-NetFirewallRule `
  -DisplayName "windows_exporter (Prometheus)" `
  -Direction Inbound -Protocol TCP -LocalPort 9100 `
  -Action Allow -Profile Any

Verify from WSL:

curl -s http://REDACTED:[REDACTED]/metrics | head -5

Part 4: Port-forward WSL 9101 to Windows host

WSL IP changes on restart; the persistence task (below) handles refresh automatically.

Initial setup

Run in PowerShell as Administrator:

$wslIp = (wsl hostname -I).Trim().Split(' ')[0]
Write-Host "WSL IP: $wslIp"

netsh interface portproxy add v4tov4 `
  listenport=9101 listenaddress=0.0.0.0 `
  connectport=9101 connectaddress=$wslIp

New-NetFirewallRule `
  -DisplayName "node_exporter WSL (Prometheus)" `
  -Direction Inbound -Protocol TCP -LocalPort 9101 `
  -Action Allow -Profile Any

netsh interface portproxy show v4tov4

Persistence scheduled task

New-Item -Path "C:\scripts" -ItemType Directory -Force

Set-Content -Path "C:\scripts\update-wsl-portproxy.ps1" -Value @'
$wslIp = (wsl hostname -I).Trim().Split(' ')[0]
netsh interface portproxy delete v4tov4 listenport=9101 listenaddress=0.0.0.0
netsh interface portproxy add v4tov4 listenport=9101 listenaddress=0.0.0.0 connectport=9101 connectaddress=$wslIp
'@

$action    = New-ScheduledTaskAction -Execute "powershell.exe" `
               -Argument "-NonInteractive -WindowStyle Hidden -File C:\scripts\update-wsl-portproxy.ps1"
$trigger   = New-ScheduledTaskTrigger -AtStartup
$principal = New-ScheduledTaskPrincipal -UserId "SYSTEM" -RunLevel Highest
$settings  = New-ScheduledTaskSettingsSet -ExecutionTimeLimit (New-TimeSpan -Minutes 5)

Register-ScheduledTask -TaskName "WSL-PortProxy-Update" `
  -Action $action -Trigger $trigger -Principal $principal -Settings $settings -Force

Part 5: Apply Helm upgrade

Once both exporters are responding, upgrade kube-prometheus-stack to activate the new scrape jobs (from CanEast AI Node WSL):

helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --version 84.3.0 \
  --namespace archon-monitoring \
  --values kubernetes/archon-monitoring/kube-prometheus-stack/values.yaml \
  --wait --timeout 10m

Part 6: Verify

up{instance=~"alienware|alienware-wsl"}
windows_cpu_time_total{instance="alienware"}
node_cpu_seconds_total{instance="alienware-wsl"}

Prometheus Targets: confirm alienware-host and alienware-wsl both show UP.


Troubleshooting

systemd not active after wsl --shutdown

# Check init process
ps -p 1 -o comm=

# Verify wsl.conf
cat /etc/wsl.conf
# Must contain systemd=true under [boot]

Run wsl --shutdown again from Windows and wait 8+ seconds before reopening WSL.

portproxy not forwarding

# Show current rules
netsh interface portproxy show v4tov4

# Check WSL IP vs rule
wsl hostname -I

# Refresh rule manually
powershell -File C:\scripts\update-wsl-portproxy.ps1

Prometheus target stays DOWN

# Test from k3s cluster
kubectl run curl-test --image=curlimages/curl --restart=Never --rm -it \
  --command -- curl -s http://REDACTED:[REDACTED]/metrics 2>&1 | head -3

kubectl run curl-test --image=curlimages/curl --restart=Never --rm -it \
  --command -- curl -s http://REDACTED:[REDACTED]/metrics 2>&1 | head -3