Ollama on CanEast AI Node WSL¶
Status: Tier 3 Operational Reference Last Updated: 2026-05-02 WI: WI-395
Overview¶
Ollama runs on the CanEast AI Node workstation (WSL 2 + Ubuntu 24.04) and is bound to the LAN IP
REDACTED:11434. This deployment serves as the Tier 1 inference backend for all
sovereign AI operations on the Archon Platform, including caneast-site1-ot2-cam01 vision inference.
Bind Configuration¶
Container name: ollama (Docker container on CanEast AI Node WSL)
Environment variable: OLLAMA_HOST=0.0.0.0:11434
Exposed port: REDACTED:11434 (LAN-bound, not loopback, not wildcard)
Network reachability: - From caneast-site1-node4 (k3s control plane): Yes (REDACTED is routable via REDACTED/24) - From caneast-site1-ot2-cam01 (Pi 5 on Wi-Fi): Yes - From caneast-site1-mqtt1 (MQTT broker): Yes - From CanEast AI Node localhost: No (loopback not bound; use Docker internal networking instead)
Current Model Inventory (2026-05-02)¶
Text Models¶
| Model | Size | Status | Notes |
|---|---|---|---|
| qwen3.5:4b | 2.6 GB | Ready | Tier 2 fallback if qwen3-vl:4b unavailable |
| qwen3:4b | 3.0 GB | Ready | Legacy; being phased out for qwen3.5 |
| glm-4.7-flash | 1.8 GB | Ready | Lightweight; good for fast inference |
| llama3.2 | 2.0 GB | Ready | General-purpose text generation |
Vision-Language (VL) Models¶
| Model | Size | Status | Notes |
|---|---|---|---|
| qwen3-vl:4b | 7.8 GB | Pulling... | Primary Tier 1 for cam01 inference |
| qwen3-vl:2b | 4.1 GB | Not pulled | Fallback if 4b OOM (unlikely; 4GB Pi 5 can share) |
Pulling status: As of 2026-05-02, qwen3-vl:4b is being pulled to local disk. Monitor with:
When pulled completes, expect a 7.8 GB model file in /var/lib/docker/volumes/ollama_models/_data.
Why CanEast AI Node (Not caneast-site1-node2/3/4)¶
| Aspect | CanEast AI Node | caneast-site1-node2 | caneast-site1-node4 |
|---|---|---|---|
| GPU | NVIDIA RTX (FP32 fast) | None | Intel UHD 630 (slow for LLM) |
| Memory | 32 GB | 8 GB | 16 GB |
| Always-on | No (dev workstation) | Yes | Yes |
| Use case | AI ops, eval, dev | Docker apps, NVR | k3s control plane |
Selected: CanEast AI Node - GPU acceleration and sufficient memory for larger models override the always-on requirement (dev sessions cover most hours).
Trade-off: When CanEast AI Node is suspended, Tier 1 inference fails and fallback to cloud tiers kicks in. This is acceptable because: 1. Zone OT-2 (cam01) permits cloud fallback per LLMOPS-0002 2. Ops monitoring will flag repeated Tier 1 failures 3. Future migration to dedicated Ollama GPU node is a Phase 2 decision
Known Limitations¶
CanEast AI Node Is Not a 24/7 Server¶
The CanEast AI Node workstation is a dev machine, not production infrastructure. When suspended or in sleep mode:
- Ollama container remains running (Docker daemon is WSL-managed)
- But the WSL VM itself may suspend, pausing all processes
- Network stack may become unresponsive
Operational impact: - Early morning (before 7 AM) when workstation is off: Tier 1 fails for ~30 min - Workday (7 AM - 6 PM): Ollama is reliably available - Evening: Depends on active dev sessions
Monitoring: caneast-site1-node4 k8sgpt operator will detect unreachable Ollama and escalate; caneast-site1-ot2-cam01 capture.py will automatically retry cloud tiers.
Model Selection Constraints¶
- Only models tested to fit in 32 GB VRAM are deployed
- qwen3-vl:4b is near the limit; larger VL models (qwen3-vl:7b, llama4-vision:13b) are not candidates
- Text models are kept lightweight (<=4b params) to leave GPU memory for simultaneous requests
Verification¶
Check Ollama health:
Expected response (example):
{
"models": [
{
"name": "qwen3.5:4b",
"modified_at": "2026-05-01T10:30:00Z",
"size": 2669856768
},
{
"name": "qwen3-vl:4b",
"modified_at": "2026-05-02T14:00:00Z",
"size": 8374390784
}
]
}
If no response or connection refused, Ollama is down. Check Docker:
Model Management¶
Pull a New Model¶
Example (pull qwen3.5:7b):
This is a long-running operation; curl will block until the model is fully downloaded and placed in the ollama_models volume.
Delete a Model¶
Example (remove qwen3:4b to save space):
List All Models¶
Integration Points¶
caneast-site1-ot2-cam01 Vision Inference¶
The cam01 capture.py process POSTs frames to Ollama's /api/generate endpoint:
curl http://REDACTED:[REDACTED]/api/generate \
-d '{
"model": "qwen3-vl:4b",
"prompt": "Describe what you see in this image in the context of a residential refrigerator. Return JSON.",
"images": ["base64-encoded-jpeg"],
"stream": false
}'
k8sgpt Operator (Future)¶
When k8sgpt-operator is deployed to archon-monitoring namespace, it will use Ollama as the LLM backend for cluster diagnostics.
Connection string: http://REDACTED:11434
Models: k8sgpt auto-selects based on available models; currently would default to qwen3:4b or llama3.2 (text generation), later to qwen3-vl:4b if available.
Troubleshooting¶
| Symptom | Diagnosis | Fix |
|---|---|---|
curl: (7) Failed to connect |
CanEast AI Node offline or WSL suspended | Resume CanEast AI Node; start Docker daemon if needed |
curl: (28) Operation timeout |
Ollama responding slowly or hung | Check docker logs ollama; restart container if needed |
| Model very slow (>30s) | OOM or resource contention on CanEast AI Node | Check GPU memory with nvidia-smi; reduce concurrent requests |
curl returns empty or null |
Model not found | Run /api/tags to confirm model is loaded; pull if missing |
Related Documentation¶
- docs/platform/frigate-caneast-site1-node4.md - Frigate NVR (consumes RTSP from Pi 5, not Ollama output)
- docs/architecture/cam01-capture-pipeline.md - cam01 inference pipeline (Ollama as Tier 1)
- docs/internal/vl-inference-cam01.md - 5-tier inference fallback chain details
- docs/adr/it/LLMOPS/LLMOPS-0002-agent-plane.md - LLMOPS decisions and Ollama placement rationale