Skip to content

Ollama on CanEast AI Node WSL

Status: Tier 3 Operational Reference Last Updated: 2026-05-02 WI: WI-395

Overview

Ollama runs on the CanEast AI Node workstation (WSL 2 + Ubuntu 24.04) and is bound to the LAN IP REDACTED:11434. This deployment serves as the Tier 1 inference backend for all sovereign AI operations on the Archon Platform, including caneast-site1-ot2-cam01 vision inference.

Bind Configuration

Container name: ollama (Docker container on CanEast AI Node WSL)

Environment variable: OLLAMA_HOST=0.0.0.0:11434

Exposed port: REDACTED:11434 (LAN-bound, not loopback, not wildcard)

Network reachability: - From caneast-site1-node4 (k3s control plane): Yes (REDACTED is routable via REDACTED/24) - From caneast-site1-ot2-cam01 (Pi 5 on Wi-Fi): Yes - From caneast-site1-mqtt1 (MQTT broker): Yes - From CanEast AI Node localhost: No (loopback not bound; use Docker internal networking instead)

Current Model Inventory (2026-05-02)

Text Models

Model Size Status Notes
qwen3.5:4b 2.6 GB Ready Tier 2 fallback if qwen3-vl:4b unavailable
qwen3:4b 3.0 GB Ready Legacy; being phased out for qwen3.5
glm-4.7-flash 1.8 GB Ready Lightweight; good for fast inference
llama3.2 2.0 GB Ready General-purpose text generation

Vision-Language (VL) Models

Model Size Status Notes
qwen3-vl:4b 7.8 GB Pulling... Primary Tier 1 for cam01 inference
qwen3-vl:2b 4.1 GB Not pulled Fallback if 4b OOM (unlikely; 4GB Pi 5 can share)

Pulling status: As of 2026-05-02, qwen3-vl:4b is being pulled to local disk. Monitor with:

docker exec ollama ollama list

When pulled completes, expect a 7.8 GB model file in /var/lib/docker/volumes/ollama_models/_data.

Why CanEast AI Node (Not caneast-site1-node2/3/4)

Aspect CanEast AI Node caneast-site1-node2 caneast-site1-node4
GPU NVIDIA RTX (FP32 fast) None Intel UHD 630 (slow for LLM)
Memory 32 GB 8 GB 16 GB
Always-on No (dev workstation) Yes Yes
Use case AI ops, eval, dev Docker apps, NVR k3s control plane

Selected: CanEast AI Node - GPU acceleration and sufficient memory for larger models override the always-on requirement (dev sessions cover most hours).

Trade-off: When CanEast AI Node is suspended, Tier 1 inference fails and fallback to cloud tiers kicks in. This is acceptable because: 1. Zone OT-2 (cam01) permits cloud fallback per LLMOPS-0002 2. Ops monitoring will flag repeated Tier 1 failures 3. Future migration to dedicated Ollama GPU node is a Phase 2 decision

Known Limitations

CanEast AI Node Is Not a 24/7 Server

The CanEast AI Node workstation is a dev machine, not production infrastructure. When suspended or in sleep mode:

  • Ollama container remains running (Docker daemon is WSL-managed)
  • But the WSL VM itself may suspend, pausing all processes
  • Network stack may become unresponsive

Operational impact: - Early morning (before 7 AM) when workstation is off: Tier 1 fails for ~30 min - Workday (7 AM - 6 PM): Ollama is reliably available - Evening: Depends on active dev sessions

Monitoring: caneast-site1-node4 k8sgpt operator will detect unreachable Ollama and escalate; caneast-site1-ot2-cam01 capture.py will automatically retry cloud tiers.

Model Selection Constraints

  • Only models tested to fit in 32 GB VRAM are deployed
  • qwen3-vl:4b is near the limit; larger VL models (qwen3-vl:7b, llama4-vision:13b) are not candidates
  • Text models are kept lightweight (<=4b params) to leave GPU memory for simultaneous requests

Verification

Check Ollama health:

curl http://REDACTED:[REDACTED]/api/tags

Expected response (example):

{
  "models": [
    {
      "name": "qwen3.5:4b",
      "modified_at": "2026-05-01T10:30:00Z",
      "size": 2669856768
    },
    {
      "name": "qwen3-vl:4b",
      "modified_at": "2026-05-02T14:00:00Z",
      "size": 8374390784
    }
  ]
}

If no response or connection refused, Ollama is down. Check Docker:

docker ps | grep ollama
docker logs ollama

Model Management

Pull a New Model

curl http://REDACTED:[REDACTED]/api/pull -d '{"name":"<model-name>"}'

Example (pull qwen3.5:7b):

curl http://REDACTED:[REDACTED]/api/pull -d '{"name":"qwen3.5:7b"}'

This is a long-running operation; curl will block until the model is fully downloaded and placed in the ollama_models volume.

Delete a Model

curl -X DELETE http://REDACTED:[REDACTED]/api/delete -d '{"name":"<model-name>"}'

Example (remove qwen3:4b to save space):

curl -X DELETE http://REDACTED:[REDACTED]/api/delete -d '{"name":"qwen3:4b"}'

List All Models

curl http://REDACTED:[REDACTED]/api/tags | jq '.models[] | {name, size}'

Integration Points

caneast-site1-ot2-cam01 Vision Inference

The cam01 capture.py process POSTs frames to Ollama's /api/generate endpoint:

curl http://REDACTED:[REDACTED]/api/generate \
  -d '{
    "model": "qwen3-vl:4b",
    "prompt": "Describe what you see in this image in the context of a residential refrigerator. Return JSON.",
    "images": ["base64-encoded-jpeg"],
    "stream": false
  }'

k8sgpt Operator (Future)

When k8sgpt-operator is deployed to archon-monitoring namespace, it will use Ollama as the LLM backend for cluster diagnostics.

Connection string: http://REDACTED:11434

Models: k8sgpt auto-selects based on available models; currently would default to qwen3:4b or llama3.2 (text generation), later to qwen3-vl:4b if available.

Troubleshooting

Symptom Diagnosis Fix
curl: (7) Failed to connect CanEast AI Node offline or WSL suspended Resume CanEast AI Node; start Docker daemon if needed
curl: (28) Operation timeout Ollama responding slowly or hung Check docker logs ollama; restart container if needed
Model very slow (>30s) OOM or resource contention on CanEast AI Node Check GPU memory with nvidia-smi; reduce concurrent requests
curl returns empty or null Model not found Run /api/tags to confirm model is loaded; pull if missing
  • docs/platform/frigate-caneast-site1-node4.md - Frigate NVR (consumes RTSP from Pi 5, not Ollama output)
  • docs/architecture/cam01-capture-pipeline.md - cam01 inference pipeline (Ollama as Tier 1)
  • docs/internal/vl-inference-cam01.md - 5-tier inference fallback chain details
  • docs/adr/it/LLMOPS/LLMOPS-0002-agent-plane.md - LLMOPS decisions and Ollama placement rationale