Platform Infrastructure Overview¶
Overview¶
The Archon Platform runs on a self-hosted k3s cluster with a layered automation stack: Ansible for node lifecycle management, Helm for workload deployment, Terraform for cloud and DNS resources, and a network boundary enforced by a KVM-resident OPNsense firewall. Every layer follows a naming and organisation convention that makes the fleet maintainable as a single operator.
Orchestration: k3s¶
k3s is the Kubernetes distribution of choice for the platform. It runs on bare-metal Linux nodes with a dedicated control-plane placement policy that protects the control plane from workload resource contention. The control-plane node selection decision is documented in PLAT-0003.
Namespaces follow a functional segmentation model: platform infrastructure, monitoring, applications, OT workloads, and security tooling are separated. This prevents namespace-level blast radius and enables per-namespace RBAC. The namespace design is documented in PLAT-0002.
Helm releases follow a consistent naming convention that matches the namespace and
function of the workload, making helm list output self-documenting.
Network Boundary: OPNsense on KVM¶
The platform firewall runs OPNsense as a KVM virtual machine on the primary infrastructure node. Running the firewall as a VM (rather than a Docker container or a dedicated appliance) provides hardware-level network isolation without requiring additional physical hardware.
The decision to run OPNsense on KVM, including the alternatives considered, is in PLAT-0001.
TLS Pipeline: cert-manager + Traefik¶
All platform-facing services are served over HTTPS. cert-manager handles certificate lifecycle using Let's Encrypt DNS-01 challenge, which does not require publicly accessible HTTP endpoints -- suitable for services that are only accessible via VPN or internal DNS.
Traefik acts as the ingress controller. IngressRoutes are configured per service with TLS passthrough and automatic HTTP-to-HTTPS redirection. The cert-manager strategy is documented in PLAT-0005; the Traefik IngressRoute pattern in PLAT-0006.
Fleet Conventions: Naming and OS¶
All nodes follow a structured naming convention that encodes cluster, zone, function, and sequence number. Node operating systems are standardised within each function class. The fleet conventions, including the node naming pattern and OS baseline, are documented in PLAT-0004.
Power outage resilience and flood detection are operational concerns that affect node placement and always-on service decisions: PLAT-0007, PLAT-0008.
Ansible Automation¶
Node lifecycle -- initial provisioning, role application, configuration drift correction -- is managed by Ansible. Playbooks run against typed inventories (IT and OT inventories are separate). Ansible quality is enforced by lint rules and a production-profile requirement that prohibits bare module names; all modules use fully-qualified collection names (FQCN).
The Ansible quality enforcement policy is documented in PLAT-0009.
Cloud and Terraform¶
Terraform manages cloud resources: Cloudflare DNS, Azure infrastructure, and any future cloud expansion. A multi-cloud strategy document describes the provider hierarchy and the criteria for adding new providers: PLAT-0010.
Key Properties¶
- k3s with dedicated control-plane placement and functional namespace segmentation
- OPNsense firewall on KVM -- network boundary separate from workload hosting
- Automated TLS via cert-manager + Traefik -- no manual certificate management
- Ansible with FQCN enforcement -- automation quality is a CI gate
- Terraform for all cloud resources -- no manual DNS or cloud console changes