Consolidated from ADR-0030 and ADR-0039 on 2026-05-02 per ADR-0047. Source files retained with deprecation banners at
docs/adr/0030-ask-archy-docs-chat-widget.mdanddocs/adr/0039-ai-operations-agent-plane.md.
LLMOPS-0002 - AI Agent Plane: Ask Archy and AI Operations¶
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-04-22 (latest source) |
| Author | Ben Peries |
| Sources | ADR-0030 (Ask Archy docs chat widget), ADR-0039 (AI operations agent plane) |
Context¶
Two decisions were established in sequence, forming the full AI agent plane:
-
Ask Archy (ADR-0030): The Archon Platform documentation at docs.peries.ca is designed to be AI-portable and machine-readable. A visitor landing on a complex architecture page should be able to ask a question - demonstrating the platform philosophy rather than just describing it. A floating chat widget powered by a hosted LLM was needed at zero cost.
-
AI operations agent plane (ADR-0039): OBS-0001 established the observability data plane. This ADR establishes the first AI operations primitive on top of that substrate - the agent plane - which consumes platform telemetry, reasons about platform state, and provides operational intelligence to human operators.
Decision¶
Part 1: Ask Archy - public documentation chat widget (from ADR-0030)¶
Deploy a floating chat widget on docs.peries.ca powered by Groq free tier (llama-3.1-8b-instruct), routed through a Cloudflare Worker for security and rate limiting. The widget is named "Ask Archy" - the Archon Platform documentation assistant.
The widget answers questions grounded in the current page context only. When a question is out of scope, it directs the visitor to connect with Ben on LinkedIn or via email.
This widget is a demonstration of AI portability, not a production agentic system. It does not have access to live platform state, ADO boards, or internal tooling. The boundary between the public documentation layer and the internal platform layer must be maintained. Ask Archy operates entirely on the public side of that boundary.
Why Groq over other options¶
| Option | Cost | Decision |
|---|---|---|
| Groq (llama-3.1-8b-instruct) | Free tier | Selected - fast response, generous free tier |
| DashScope qwen-turbo | Free tier | Backup option (key already in Infisical) |
| Anthropic Claude API | Paid | Rejected - cost exposure on unauthenticated public traffic |
| Ollama self-hosted | Free | Rejected - exposes internal inference infra to public internet (CISO concern) |
Security controls (defense in depth)¶
| Layer | Control | Purpose |
|---|---|---|
| Cloudflare Turnstile | Bot verification on first message | Blocks scripted abuse |
| Cloudflare Worker rate limiting | 100 requests per IP per hour | Prevents API cost attacks |
| CORS locked to docs.peries.ca | Worker rejects other origins | Prevents hotlinking |
| Max 400 tokens per response | Hard cap in Groq API call | Limits per-request cost |
| System prompt lockdown | Agent cannot follow user instructions | Blocks prompt injection |
| No internal context | Widget only sees current page text | No internal data exposure |
**Known security tradeoff - Turnstile single-use token:REDACTED After the first message is sent,
the Turnstile token is consumed. Subsequent messages in the same session send an empty token;
CORS is the only gate for message 2+. CORS is enforced by browsers, not the internet -
server-side scripts can forge an Origin: https://docs.peries.ca header. An attacker can
loop requests to burn the Groq free tier quota; the KV rate limiter (100 req/IP/hr) kicks
in for real users. Blast radius: degraded Archy availability, not a data breach or a bill.
The proper fix (HMAC session token) is over-engineered for a homelab docs chatbot on a free
tier quota. Risk accepted.
Deployment¶
- Ask Archy widget deployed to docs.peries.ca via MkDocs
extra_javascript - Cloudflare Worker
ask-archydeployed in archon-web repo GROQ_API_KEYstored in Infisical archon-platform project- Cloudflare Turnstile sitekey and secret stored in Infisical
Addendum - 2026-04-12¶
Rate limit raised from 30 to 100 req/IP/hr (WI-164, WI-165): the initial limit proved
too restrictive for legitimate multi-turn conversations. Rate-limit response changed from
HTTP 429 to a friendly 200 { reply: "..." } so the widget displays it as a regular Archy
chat bubble rather than a silent failure.
System prompt expanded (WI-161, WI-163): Full self-awareness added - name origin, creator attribution, ADR-0030 reference, platform description. General language-detection instruction added: detect the language the user writes in and always respond in that language.
sessionStorage persistence added (WI-160): Chat history persisted to sessionStorage
under key archy-history. History survives MkDocs SPA-style page navigation within the
same tab. Explicit X-close clears history.
Part 2: AI operations agent plane (from ADR-0039)¶
Primary: k8sgpt-operator¶
Deploy k8sgpt-operator to the archon-monitoring namespace on caneast-site1-node4's k3s cluster.
k8sgpt performs continuous AI-driven diagnostics of k3s cluster state - failing pods,
misconfigured resources, PVC issues, image pull failures. Results are published as k8s
custom resources and surfaced in OBS-0001 Grafana and Alertmanager routing.
Multi-backend LLM strategy¶
The agent plane does not commit to a single LLM provider. Different workloads have different privacy, latency, and cost profiles.
Available backends:
| Backend | Type | Authentication |
|---|---|---|
| Ollama (local) | Self-hosted, sovereign | None (network-local) |
| Alibaba DashScope (qwen-turbo, qwen-max) | Hosted | DASHSCOPE_API_KEY in Infisical |
| Groq (Llama 3.x, Mixtral) | Hosted, high-speed | API key in Infisical |
| Anthropic API (Claude) | Hosted | API key in Infisical |
Routing policy:
| Use case | Backend | Rationale |
|---|---|---|
| k8sgpt cluster diagnostics | Ollama | Cluster topology is sovereign; data does not leave environment |
| OT-AI agents (future) | Ollama (MANDATORY) | OT telemetry is regulated-equivalent data; hard rule |
| Archy Infra natural-language queries (future) | Ollama default, Groq optional | IT metadata is sensitive; local default, hosted only for explicit low-sensitivity queries |
| Public-facing translation (peries.ca) | DashScope qwen-turbo | Already decided; public content; no sovereignty concern |
| Platform documentation drafting | Anthropic API or hosted | Public platform docs, not operational data |
Sovereignty rule (MANDATORY hard rule)¶
Any AI operations workload that consumes OT telemetry, cluster internals, secrets metadata, or identifiable platform topology MUST use a local backend (Ollama). Hosted backends are permitted only for workloads where the input data is public or explicitly sanitized.
This is a hard rule, not a guideline. The Archon Platform includes OT telemetry that represents real operational data. A soft "prefer local" guideline would not be credible. The hard rule is both the right technical choice and the right portfolio signal for a platform handling regulated-equivalent data.
Ollama placement (interim)¶
k8sgpt's interim LLM backend is the Ollama instance on the CanEast AI Node workstation at
REDACTED:11434 (DEV-0001/DEV-0002). This is a known-degraded dependency:
CanEast AI Node is not a 24/7 server. When suspended, k8sgpt analyses requiring LLM explanation
will fail.
Acceptable degraded state: When Ollama is unreachable, k8sgpt falls back to non-LLM analysis mode - it still reports detected problems, without natural-language explanation. The platform remains functional.
Pre-flight requirement (WI-318): Before k8sgpt implementation, confirm:
1. Ollama is listening on 0.0.0.0 (not 127.0.0.1 only)
2. The WSL network path exposes the port to the LAN
3. Firewall permits caneast-site1-node4 โ CanEast AI Node:[REDACTED]
If any fail, the interim placement is invalid and a follow-up Ollama placement ADR is required before k8sgpt implementation proceeds.
Candidate targets for a dedicated Ollama deployment: 1. caneast-site1-node4 CPU inference - always-on, slow, uses small models (phi3:mini, qwen2.5:3b) 2. Dedicated GPU node - hardware acquisition required 3. Status quo with monitoring - treat Ollama unreachability as non-critical degraded state
Agent plane boundaries¶
The agent plane is advisory only at this stage. Agents surface findings, explain state, and propose remediation. They do NOT execute changes. Autonomous remediation requires a future ADR with explicit scope, safety rails, and audit mechanisms - not justified for a five-node platform at this phase.
Why k8sgpt, not a custom build: k8sgpt is an established open-source project with native Ollama support, built-in k8s analyzers, PII filtering before LLM submission, and an operator pattern for continuous scanning. Building equivalent capability from scratch would consume Phase 2 entirely.
Why multi-backend: Committing to a single LLM provider creates migration-risk coupling. Multi-backend treats LLMs as interchangeable infrastructure, aligning with the AI portability principle (DEV-0001/DEV-0002).
Why advisory-only: Autonomous remediation with live OT telemetry and real infrastructure requires failure-mode analysis and blast-radius controls not justified at this fleet scale. The platform's credibility depends on demonstrating judgment about autonomy boundaries.
Consequences¶
Ask Archy:
- Widget deployed to docs.peries.ca; Cloudflare Worker ask-archy in archon-web repo
- Groq free tier is the inference backend; DashScope is the documented fallback
- Anthropic Claude API explicitly excluded from public widget use (cost exposure)
- Turnstile HMAC gap accepted; blast radius is degraded availability only
AI operations agent plane:
- k8sgpt-operator deployment to archon-monitoring namespace on caneast-site1-node4
- Multi-backend routing policy is the standard for all future AI operations work
- Sovereignty rule is in effect; all OT telemetry and cluster internals use local Ollama
- Agent plane is advisory-only; autonomous remediation deferred to a future ADR
- LLMOPS-0001 (OpenClaw) and this ADR together complete the Phase 3 AI agent architecture
References¶
- OBS-0001 - IT observability data plane (k8sgpt surfaces findings to Grafana; substrate dependency)
- PLAT-0003 - k3s control plane migration to caneast-site1-node4 (k8sgpt deployment target)
- DEV-0001 - Developer environment (CanEast AI Node Ollama endpoint; AI portability principle)
- DEV-0002 - CanEast AI Node workstation-as-code (Ollama interim placement context)
- LLMOPS-0001 - OpenClaw gateway (Telegram ChatOps; separate from k8sgpt advisory)
- IAM-0001 - Infisical for secrets (Groq, DashScope, Anthropic API keys)
- WI-259 - Epic E3: AI Applications - OpenClaw & OT-AI
- WI-321 - k8sgpt AI operations agent plane
- WI-318 - Ollama reachability audit from caneast-site1-node4 (pre-flight gate for k8sgpt)
Addendum -- 2026-05-02¶
Residential OT zone sovereignty clarification (WI-395, caneast-site1-ot2-cam01):
The sovereignty rule in the Decision section applies to industrial OT telemetry. Zone OT-2 (main floor and garage) is residential in nature and is explicitly excluded from the industrial data sovereignty constraint.
Zone classification: - ot-zone (basement): industrial-equivalent sensor data; Ollama mandatory - ot-zone (main floor, garage): residential/consumer; cloud inference permitted - ot-zone (outdoor): residential/consumer; cloud inference permitted - ot-zone (rack monitoring): infrastructure telemetry; Ollama mandatory
For caneast-site1-ot2-cam01 (zone ot-zone), the inference tier ladder applies: Tier 1 = Ollama qwen3-vl:4b (primary, sovereign, always attempted first); Tier 2 = DashScope (fallback on Ollama unavailability); Tier 3 = Mistral Pixtral (secondary fallback); Tier 4 = Groq Llama4 (tertiary fallback). Bench-only: Gemini (not in production rotation).
Ollama remains the default and preferred tier for all zones. Cloud fallback tiers are permitted for residential zones only and only when Ollama is unavailable.