Consolidated from ADR-0030 and ADR-0039 on 2026-05-02 per ADR-0047. Source files retained with deprecation banners at docs/adr/0030-ask-archy-docs-chat-widget.md and docs/adr/0039-ai-operations-agent-plane.md.

LLMOPS-0002 - AI Agent Plane: Ask Archy and AI Operations¶

Field	Value
Status	Accepted
Date	2026-04-22 (latest source)
Author	Ben Peries
Sources	ADR-0030 (Ask Archy docs chat widget), ADR-0039 (AI operations agent plane)

Context¶

Two decisions were established in sequence, forming the full AI agent plane:

Ask Archy (ADR-0030): The Archon Platform documentation at docs.peries.ca is designed to be AI-portable and machine-readable. A visitor landing on a complex architecture page should be able to ask a question - demonstrating the platform philosophy rather than just describing it. A floating chat widget powered by a hosted LLM was needed at zero cost.
AI operations agent plane (ADR-0039): OBS-0001 established the observability data plane. This ADR establishes the first AI operations primitive on top of that substrate - the agent plane - which consumes platform telemetry, reasons about platform state, and provides operational intelligence to human operators.

Decision¶

Deploy a floating chat widget on docs.peries.ca powered by Groq free tier (llama-3.1-8b-instruct), routed through a Cloudflare Worker for security and rate limiting. The widget is named "Ask Archy" - the Archon Platform documentation assistant.

The widget answers questions grounded in the current page context only. When a question is out of scope, it directs the visitor to connect with Ben on LinkedIn or via email.

This widget is a demonstration of AI portability, not a production agentic system. It does not have access to live platform state, ADO boards, or internal tooling. The boundary between the public documentation layer and the internal platform layer must be maintained. Ask Archy operates entirely on the public side of that boundary.

Why Groq over other options¶

Option	Cost	Decision
Groq (llama-3.1-8b-instruct)	Free tier	Selected - fast response, generous free tier
DashScope qwen-turbo	Free tier	Backup option (key already in Infisical)
Anthropic Claude API	Paid	Rejected - cost exposure on unauthenticated public traffic
Ollama self-hosted	Free	Rejected - exposes internal inference infra to public internet (CISO concern)

Security controls (defense in depth)¶

Layer	Control	Purpose
Cloudflare Turnstile	Bot verification on first message	Blocks scripted abuse
Cloudflare Worker rate limiting	100 requests per IP per hour	Prevents API cost attacks
CORS locked to docs.peries.ca	Worker rejects other origins	Prevents hotlinking
Max 400 tokens per response	Hard cap in Groq API call	Limits per-request cost
System prompt lockdown	Agent cannot follow user instructions	Blocks prompt injection
No internal context	Widget only sees current page text	No internal data exposure

**Known security tradeoff - Turnstile single-use token:REDACTED After the first message is sent, the Turnstile token is consumed. Subsequent messages in the same session send an empty token; CORS is the only gate for message 2+. CORS is enforced by browsers, not the internet - server-side scripts can forge an Origin: https://docs.peries.ca header. An attacker can loop requests to burn the Groq free tier quota; the KV rate limiter (100 req/IP/hr) kicks in for real users. Blast radius: degraded Archy availability, not a data breach or a bill. The proper fix (HMAC session token) is over-engineered for a homelab docs chatbot on a free tier quota. Risk accepted.

Deployment¶

Ask Archy widget deployed to docs.peries.ca via MkDocs extra_javascript
Cloudflare Worker ask-archy deployed in archon-web repo
GROQ_API_KEY stored in Infisical archon-platform project
Cloudflare Turnstile sitekey and secret stored in Infisical

Addendum - 2026-04-12¶

Rate limit raised from 30 to 100 req/IP/hr (WI-164, WI-165): the initial limit proved too restrictive for legitimate multi-turn conversations. Rate-limit response changed from HTTP 429 to a friendly 200 { reply: "..." } so the widget displays it as a regular Archy chat bubble rather than a silent failure.

System prompt expanded (WI-161, WI-163): Full self-awareness added - name origin, creator attribution, ADR-0030 reference, platform description. General language-detection instruction added: detect the language the user writes in and always respond in that language.

sessionStorage persistence added (WI-160): Chat history persisted to sessionStorage under key archy-history. History survives MkDocs SPA-style page navigation within the same tab. Explicit X-close clears history.

Part 2: AI operations agent plane (from ADR-0039)¶

Primary: k8sgpt-operator¶

Deploy k8sgpt-operator to the archon-monitoring namespace on caneast-site1-node4's k3s cluster. k8sgpt performs continuous AI-driven diagnostics of k3s cluster state - failing pods, misconfigured resources, PVC issues, image pull failures. Results are published as k8s custom resources and surfaced in OBS-0001 Grafana and Alertmanager routing.

Multi-backend LLM strategy¶

The agent plane does not commit to a single LLM provider. Different workloads have different privacy, latency, and cost profiles.

Available backends:

Backend	Type	Authentication
Ollama (local)	Self-hosted, sovereign	None (network-local)
Alibaba DashScope (qwen-turbo, qwen-max)	Hosted	`DASHSCOPE_API_KEY` in Infisical
Groq (Llama 3.x, Mixtral)	Hosted, high-speed	API key in Infisical
Anthropic API (Claude)	Hosted	API key in Infisical

Routing policy:

Use case	Backend	Rationale
k8sgpt cluster diagnostics	Ollama	Cluster topology is sovereign; data does not leave environment
OT-AI agents (future)	Ollama (MANDATORY)	OT telemetry is regulated-equivalent data; hard rule
Archy Infra natural-language queries (future)	Ollama default, Groq optional	IT metadata is sensitive; local default, hosted only for explicit low-sensitivity queries
Public-facing translation (peries.ca)	DashScope qwen-turbo	Already decided; public content; no sovereignty concern
Platform documentation drafting	Anthropic API or hosted	Public platform docs, not operational data

Sovereignty rule (MANDATORY hard rule)¶

Any AI operations workload that consumes OT telemetry, cluster internals, secrets metadata, or identifiable platform topology MUST use a local backend (Ollama). Hosted backends are permitted only for workloads where the input data is public or explicitly sanitized.

This is a hard rule, not a guideline. The Archon Platform includes OT telemetry that represents real operational data. A soft "prefer local" guideline would not be credible. The hard rule is both the right technical choice and the right portfolio signal for a platform handling regulated-equivalent data.

Ollama placement (interim)¶

k8sgpt's interim LLM backend is the Ollama instance on the CanEast AI Node workstation at REDACTED:11434 (DEV-0001/DEV-0002). This is a known-degraded dependency: CanEast AI Node is not a 24/7 server. When suspended, k8sgpt analyses requiring LLM explanation will fail.

Acceptable degraded state: When Ollama is unreachable, k8sgpt falls back to non-LLM analysis mode - it still reports detected problems, without natural-language explanation. The platform remains functional.

Pre-flight requirement (WI-318): Before k8sgpt implementation, confirm: 1. Ollama is listening on 0.0.0.0 (not 127.0.0.1 only) 2. The WSL network path exposes the port to the LAN 3. Firewall permits caneast-site1-node4 → CanEast AI Node:[REDACTED]

If any fail, the interim placement is invalid and a follow-up Ollama placement ADR is required before k8sgpt implementation proceeds.

Candidate targets for a dedicated Ollama deployment: 1. caneast-site1-node4 CPU inference - always-on, slow, uses small models (phi3:mini, qwen2.5:3b) 2. Dedicated GPU node - hardware acquisition required 3. Status quo with monitoring - treat Ollama unreachability as non-critical degraded state

Agent plane boundaries¶

The agent plane is advisory only at this stage. Agents surface findings, explain state, and propose remediation. They do NOT execute changes. Autonomous remediation requires a future ADR with explicit scope, safety rails, and audit mechanisms - not justified for a five-node platform at this phase.

Why k8sgpt, not a custom build: k8sgpt is an established open-source project with native Ollama support, built-in k8s analyzers, PII filtering before LLM submission, and an operator pattern for continuous scanning. Building equivalent capability from scratch would consume Phase 2 entirely.

Why multi-backend: Committing to a single LLM provider creates migration-risk coupling. Multi-backend treats LLMs as interchangeable infrastructure, aligning with the AI portability principle (DEV-0001/DEV-0002).

Why advisory-only: Autonomous remediation with live OT telemetry and real infrastructure requires failure-mode analysis and blast-radius controls not justified at this fleet scale. The platform's credibility depends on demonstrating judgment about autonomy boundaries.

Consequences¶

Ask Archy: - Widget deployed to docs.peries.ca; Cloudflare Worker ask-archy in archon-web repo - Groq free tier is the inference backend; DashScope is the documented fallback - Anthropic Claude API explicitly excluded from public widget use (cost exposure) - Turnstile HMAC gap accepted; blast radius is degraded availability only

AI operations agent plane: - k8sgpt-operator deployment to archon-monitoring namespace on caneast-site1-node4 - Multi-backend routing policy is the standard for all future AI operations work - Sovereignty rule is in effect; all OT telemetry and cluster internals use local Ollama - Agent plane is advisory-only; autonomous remediation deferred to a future ADR - LLMOPS-0001 (OpenClaw) and this ADR together complete the Phase 3 AI agent architecture

References¶

OBS-0001 - IT observability data plane (k8sgpt surfaces findings to Grafana; substrate dependency)
PLAT-0003 - k3s control plane migration to caneast-site1-node4 (k8sgpt deployment target)
DEV-0001 - Developer environment (CanEast AI Node Ollama endpoint; AI portability principle)
DEV-0002 - CanEast AI Node workstation-as-code (Ollama interim placement context)
LLMOPS-0001 - OpenClaw gateway (Telegram ChatOps; separate from k8sgpt advisory)
IAM-0001 - Infisical for secrets (Groq, DashScope, Anthropic API keys)
WI-259 - Epic E3: AI Applications - OpenClaw & OT-AI
WI-321 - k8sgpt AI operations agent plane
WI-318 - Ollama reachability audit from caneast-site1-node4 (pre-flight gate for k8sgpt)

Addendum -- 2026-05-02¶

Residential OT zone sovereignty clarification (WI-395, caneast-site1-ot2-cam01):

The sovereignty rule in the Decision section applies to industrial OT telemetry. Zone OT-2 (main floor and garage) is residential in nature and is explicitly excluded from the industrial data sovereignty constraint.

Zone classification: - ot-zone (basement): industrial-equivalent sensor data; Ollama mandatory - ot-zone (main floor, garage): residential/consumer; cloud inference permitted - ot-zone (outdoor): residential/consumer; cloud inference permitted - ot-zone (rack monitoring): infrastructure telemetry; Ollama mandatory

For caneast-site1-ot2-cam01 (zone ot-zone), the inference tier ladder applies: Tier 1 = Ollama qwen3-vl:4b (primary, sovereign, always attempted first); Tier 2 = DashScope (fallback on Ollama unavailability); Tier 3 = Mistral Pixtral (secondary fallback); Tier 4 = Groq Llama4 (tertiary fallback). Bench-only: Gemini (not in production rotation).

Ollama remains the default and preferred tier for all zones. Cloud fallback tiers are permitted for residential zones only and only when Ollama is unavailable.