Skip to content

Deprecated — Consolidated into LLMOPS-0002 on 2026-05-02 per ADR-0047. This source file is retained as a reference; the canonical content is in LLMOPS-0002.

ADR-0030: Ask Archy -- AI-Powered Documentation Chat Widget

| Status | Accepted | | Date | 2026-04-11 | | Author | Ben Peries | | Phases | Phase 4 (public platform presence) | | ADO WI | 158 |

Context

In traditional IT, the first skill taught was how to search effectively. For roughly 15 years, "Google it" was the default answer when someone did not know something. Over the last 1-2 years that has shifted to "ChatGPT it" or "ask AI." The way people seek technical knowledge has fundamentally changed.

The Archon Platform documentation at docs.peries.ca is designed to be AI-portable and machine-readable. It is the external face of the platform for a technical audience including potential employers, collaborators, and peers in the infrastructure community. A visitor landing on a complex architecture page should not have to read the entire docs tree to get an answer to a specific question. They should be able to ask.

Additionally, the platform is being built as a blueprint for AI-driven IT operations. Having an AI agent embedded in the documentation is not just a convenience feature -- it is a demonstration of the philosophy. The docs explain AI portability. The widget demonstrates it.

Decision

Deploy a floating chat widget on docs.peries.ca powered by a Groq free tier LLM (llama-3.1-8b-instruct), routed through a Cloudflare Worker for security and rate limiting. The widget is named "Ask Archy" and introduces itself as Archy, the Archon Platform documentation assistant.

The widget answers questions grounded in the current page context only. When a question is out of scope, it directs the visitor to connect with Ben on LinkedIn or via email.

Honest Assessment -- The Agentic Gap

This widget is a demonstration of AI portability, not a production agentic system. It does not have access to live platform state, ADO boards, or internal tooling. It answers from static documentation context only.

This is intentional. A public-facing AI agent with access to internal infrastructure would be a significant attack surface. The boundary between the public documentation layer and the internal platform layer must be maintained. Ask Archy operates entirely on the public side of that boundary.

As the platform evolves, a separate authenticated agent for internal operations (already partially implemented via OpenClaw and SentinelBot) remains strictly internal.

Why Groq Over Other Options

Option Cost Speed Rationale
Groq (llama-3.1-8b-instruct) Free tier Very fast Chosen
DashScope qwen-turbo Free tier Medium Already in Infisical, backup option
Anthropic Claude API Paid Fast Cost exposure on public traffic
Ollama (self-hosted) Free Medium Not designed for public internet traffic
Ollama cloud Free tier Medium Adds vendor dependency

Groq was chosen for response speed on a chat widget UX and a generous free tier appropriate for a low-traffic portfolio site. DashScope is the documented fallback if Groq free tier is ever insufficient.

Anthropic Claude API was explicitly rejected for public widget use due to cost exposure risk on unauthenticated public traffic.

Ollama self-hosted was rejected because exposing internal inference infrastructure to public internet traffic adds attack surface to the internal network. The CISO concern outweighs the zero-cost benefit.

Security Controls

Defense in depth. The widget is public-facing and unauthenticated.

Layer Control Purpose
Cloudflare Turnstile Bot verification on first message Blocks scripted abuse
Cloudflare Worker rate limiting 100 requests per IP per hour Prevents API cost attacks
CORS locked to docs.peries.ca Worker rejects other origins Prevents hotlinking
Max 400 tokens per response Hard cap in Groq API call Limits per-request cost
System prompt lockdown Agent cannot follow user instructions Blocks prompt injection
No internal context Widget only sees current page text No internal data exposure

Known Security Tradeoff — Turnstile Single-Use Token

Cloudflare Turnstile tokens are single-use. After the first message is sent, the token is consumed. Subsequent messages in the same session send an empty token, and the Worker skips Turnstile verification for those requests.

This means CORS is the only gate for message 2 and beyond. CORS is enforced by browsers, not by the internet. Any server-side script can forge an Origin: https://docs.peries.ca header and send unlimited POST requests to the Worker, bypassing Turnstile entirely.

What an attacker can do: Loop requests to burn the Groq free tier quota. At scale, the KV rate limiter (100 req/IP/hr) kicks in and the Worker returns a friendly rate-limit message to real users from that IP.

How serious is it: Moderate in theory, low in practice. The Groq free tier is the natural ceiling — not a billing account. The blast radius is degraded availability for Archy, not a data breach or a bill.

The proper fix: Issue a short-lived HMAC session token after Turnstile verification passes. The Worker signs it; the widget sends it on subsequent requests; the Worker verifies the signature. This closes the forged-Origin gap. It is the right engineering answer and is over-engineered for a homelab docs chatbot on a free tier quota.

Decision: Accept the tradeoff. If abuse becomes a real problem, implement the HMAC session token pattern at that point.

Consequences

  • Ask Archy widget deployed to docs.peries.ca via MkDocs extra_javascript
  • Cloudflare Worker ask-archy deployed in archon-web repo
  • GROQ_API_KEY stored in Infisical archon-platform project
  • Cloudflare Turnstile sitekey and secret stored in Infisical
  • peries-ca-translator pattern reused for Worker structure
  • No changes to internal infrastructure or internal agents
  • Future: same widget pattern can be applied to peries.ca portfolio site

Future Considerations (Backlog)

AnythingLLM as full RAG platform: AnythingLLM (MIT licensed, self-hosted via Docker) was evaluated as an alternative. It would provide proper RAG across all docs rather than current-page context only, and supports an embeddable widget. Minimum requirements are 2GB RAM and 2 vCPU -- deployable as an Azure B1ms instance (~$15 CAD/month) connected to Groq free tier for inference.

This is deferred because the Cloudflare Worker approach covers the immediate need at zero cost with lower operational complexity.

Internal AI assistant for Archon operators: AnythingLLM or a similar platform could serve as an internal knowledge base agent -- accessible via MS Teams, Slack, or a private web UI -- allowing operators to query platform documentation, ADRs, runbooks, and operational context conversationally. This is a Phase 5+ consideration and requires authentication, access control, and integration with internal tooling. It is tracked as a backlog item separate from the public-facing Ask Archy widget.

References

  • ADR-0002 (Infisical for secrets) -- GROQ_API_KEY and Turnstile secrets stored in Infisical
  • ADR-0017 (Public docs pipeline) -- docs.peries.ca deployment via Cloudflare Pages
  • ADR-0021 (Claude Code developer tooling) -- widget authored with Claude Code

Addendum — 2026-04-12

Rate limit raised from 30 to 100 req/IP/hr (WI #164, WI #165)

The initial rate limit of 30 req/IP/hr proved too restrictive for legitimate multi-turn conversations. A curious visitor asking several follow-up questions could exhaust the limit within a single session. The limit was raised to 100 req/IP/hr, which accommodates genuine use while still providing meaningful protection against API quota abuse.

The rate-limit response was also changed from a 429 error to a friendly 200 { reply: "..." } so the widget displays it as a regular Archy chat bubble rather than a silent failure (WI #166).

System prompt expanded (WI #161, WI #163)

Archy's system prompt was updated to include full self-awareness: name origin, creator attribution, ADR-0030 reference, and platform description. A general language-detection instruction was added: "Detect the language the user writes in and always respond in that same language. Default to English if the language cannot be determined."

sessionStorage persistence added (WI #160)

Chat history is now persisted to sessionStorage under key archy-history. History survives MkDocs SPA-style page navigation within the same tab. Explicit X-close clears history for a fresh start. The Turnstile token is not re-requested when restoring history from a prior navigation.