Ollama on CanEast AI Node WSL¶

Status: Tier 3 Operational Reference Last Updated: 2026-05-02 WI: WI-395

Overview¶

Ollama runs on the CanEast AI Node workstation (WSL 2 + Ubuntu 24.04) and is bound to the LAN IP REDACTED:11434. This deployment serves as the Tier 1 inference backend for all sovereign AI operations on the Archon Platform, including caneast-site1-ot2-cam01 vision inference.

Bind Configuration¶

Container name: ollama (Docker container on CanEast AI Node WSL)

Environment variable: OLLAMA_HOST=0.0.0.0:11434

Exposed port: REDACTED:11434 (LAN-bound, not loopback, not wildcard)

Network reachability: - From caneast-site1-node4 (k3s control plane): Yes (REDACTED is routable via REDACTED/24) - From caneast-site1-ot2-cam01 (Pi 5 on Wi-Fi): Yes - From caneast-site1-mqtt1 (MQTT broker): Yes - From CanEast AI Node localhost: No (loopback not bound; use Docker internal networking instead)

Current Model Inventory (2026-05-02)¶

Text Models¶

Model	Size	Status	Notes
qwen3.5:4b	2.6 GB	Ready	Tier 2 fallback if qwen3-vl:4b unavailable
qwen3:4b	3.0 GB	Ready	Legacy; being phased out for qwen3.5
glm-4.7-flash	1.8 GB	Ready	Lightweight; good for fast inference
llama3.2	2.0 GB	Ready	General-purpose text generation

Vision-Language (VL) Models¶

Model	Size	Status	Notes
qwen3-vl:4b	7.8 GB	Pulling...	Primary Tier 1 for cam01 inference
qwen3-vl:2b	4.1 GB	Not pulled	Fallback if 4b OOM (unlikely; 4GB Pi 5 can share)

Pulling status: As of 2026-05-02, qwen3-vl:4b is being pulled to local disk. Monitor with:

docker exec ollama ollama list

When pulled completes, expect a 7.8 GB model file in /var/lib/docker/volumes/ollama_models/_data.

Why CanEast AI Node (Not caneast-site1-node2/3/4)¶

Aspect	CanEast AI Node	caneast-site1-node2	caneast-site1-node4
GPU	NVIDIA RTX (FP32 fast)	None	Intel UHD 630 (slow for LLM)
Memory	32 GB	8 GB	16 GB
Always-on	No (dev workstation)	Yes	Yes
Use case	AI ops, eval, dev	Docker apps, NVR	k3s control plane

Selected: CanEast AI Node - GPU acceleration and sufficient memory for larger models override the always-on requirement (dev sessions cover most hours).

Trade-off: When CanEast AI Node is suspended, Tier 1 inference fails and fallback to cloud tiers kicks in. This is acceptable because: 1. Zone OT-2 (cam01) permits cloud fallback per LLMOPS-0002 2. Ops monitoring will flag repeated Tier 1 failures 3. Future migration to dedicated Ollama GPU node is a Phase 2 decision

Known Limitations¶

CanEast AI Node Is Not a 24/7 Server¶

The CanEast AI Node workstation is a dev machine, not production infrastructure. When suspended or in sleep mode:

Ollama container remains running (Docker daemon is WSL-managed)
But the WSL VM itself may suspend, pausing all processes
Network stack may become unresponsive

Operational impact: - Early morning (before 7 AM) when workstation is off: Tier 1 fails for ~30 min - Workday (7 AM - 6 PM): Ollama is reliably available - Evening: Depends on active dev sessions

Monitoring: caneast-site1-node4 k8sgpt operator will detect unreachable Ollama and escalate; caneast-site1-ot2-cam01 capture.py will automatically retry cloud tiers.

Model Selection Constraints¶

Only models tested to fit in 32 GB VRAM are deployed
qwen3-vl:4b is near the limit; larger VL models (qwen3-vl:7b, llama4-vision:13b) are not candidates
Text models are kept lightweight (<=4b params) to leave GPU memory for simultaneous requests

Verification¶

Check Ollama health:

curl http://REDACTED:[REDACTED]/api/tags

Expected response (example):

{
  "models": [
    {
      "name": "qwen3.5:4b",
      "modified_at": "2026-05-01T10:30:00Z",
      "size": 2669856768
    },
    {
      "name": "qwen3-vl:4b",
      "modified_at": "2026-05-02T14:00:00Z",
      "size": 8374390784
    }
  ]
}

If no response or connection refused, Ollama is down. Check Docker:

docker ps | grep ollama
docker logs ollama

Model Management¶

Pull a New Model¶

curl http://REDACTED:[REDACTED]/api/pull -d '{"name":"<model-name>"}'

Example (pull qwen3.5:7b):

curl http://REDACTED:[REDACTED]/api/pull -d '{"name":"qwen3.5:7b"}'

This is a long-running operation; curl will block until the model is fully downloaded and placed in the ollama_models volume.

Delete a Model¶

curl -X DELETE http://REDACTED:[REDACTED]/api/delete -d '{"name":"<model-name>"}'

Example (remove qwen3:4b to save space):

curl -X DELETE http://REDACTED:[REDACTED]/api/delete -d '{"name":"qwen3:4b"}'

List All Models¶

curl http://REDACTED:[REDACTED]/api/tags | jq '.models[] | {name, size}'

Integration Points¶

caneast-site1-ot2-cam01 Vision Inference¶

The cam01 capture.py process POSTs frames to Ollama's /api/generate endpoint:

curl http://REDACTED:[REDACTED]/api/generate \
  -d '{
    "model": "qwen3-vl:4b",
    "prompt": "Describe what you see in this image in the context of a residential refrigerator. Return JSON.",
    "images": ["base64-encoded-jpeg"],
    "stream": false
  }'

k8sgpt Operator (Future)¶

When k8sgpt-operator is deployed to archon-monitoring namespace, it will use Ollama as the LLM backend for cluster diagnostics.

Connection string: http://REDACTED:11434

Models: k8sgpt auto-selects based on available models; currently would default to qwen3:4b or llama3.2 (text generation), later to qwen3-vl:4b if available.

Troubleshooting¶

Symptom	Diagnosis	Fix
`curl: (7) Failed to connect`	CanEast AI Node offline or WSL suspended	Resume CanEast AI Node; start Docker daemon if needed
`curl: (28) Operation timeout`	Ollama responding slowly or hung	Check `docker logs ollama`; restart container if needed
Model very slow (>30s)	OOM or resource contention on CanEast AI Node	Check GPU memory with `nvidia-smi`; reduce concurrent requests
`curl` returns empty or null	Model not found	Run `/api/tags` to confirm model is loaded; pull if missing

docs/platform/frigate-caneast-site1-node4.md - Frigate NVR (consumes RTSP from Pi 5, not Ollama output)
docs/architecture/cam01-capture-pipeline.md - cam01 inference pipeline (Ollama as Tier 1)
docs/internal/vl-inference-cam01.md - 5-tier inference fallback chain details
docs/adr/it/LLMOPS/LLMOPS-0002-agent-plane.md - LLMOPS decisions and Ollama placement rationale