Skip to content

Migrated from ADR-0016 on 2026-05-02 per ADR-0047. This source file is retained as a reference; the canonical content is in PLAT-0002.

PLAT-0002 — k3s Namespace Design: Domain-Based Pattern

Field Value
Status Accepted
Date 2026-04-03
Author Ben Peries
Sources ADR-0016

Context

The k3s cluster is running with caneast-site1-node3 as control-plane and caneast-site1-node2 as worker node. Workloads span infrastructure tooling, monitoring, application services, OT-facing pipelines, and security. A namespace strategy is needed before deploying workloads.

Two patterns were evaluated: - Pattern A — environment-based: dev, staging, prod namespaces. Each environment contains all workloads. - Pattern B — domain/team-based: namespaces map to functional domains. Each namespace owns a category of workloads.

Decision

Pattern B — domain/team-based namespaces.

Namespace map

Namespace Purpose Example workloads
archon-infra Core platform tools AWX, Infisical agents, ArgoCD
archon-monitoring Observability stack Grafana, InfluxDB, Telegraf, Prometheus
archon-apps Application workloads Node-RED, peries.ca, Homepage
archon-ot OT-facing services MQTT bridge, OT data processors
archon-security Security tooling CrowdSec, Suricata, Falco, Conpot

Rationale

  • Maps directly to the IT/OT separation story established in IAM-0002 — OT workloads are isolated in archon-ot, security tooling in archon-security
  • Mirrors enterprise domain ownership model — each namespace has a clear owner and purpose, even in a single-operator platform
  • Supports per-namespace RBAC — when AWX (Phase 3) manages deployments, service accounts can be scoped to their domain namespace only
  • Enables per-namespace NetworkPolicies — archon-ot can restrict egress to MQTT broker and InfluxDB only, archon-security can access all namespaces for monitoring
  • Aligns with ArgoCD ApplicationSets — one Application per namespace, clean GitOps sync boundaries

Alternatives Considered

Pattern A — environment-based (dev, staging, prod)

Rejected. Archon is a single-environment homelab — there is no staging or prod distinction at this scale. Creating dev/staging/prod namespaces would result in workloads only ever running in prod, with the other namespaces empty. Environment separation is better handled at the ADO pipeline level (environment gates) than at the namespace level.

Flat default namespace

Rejected. All workloads in default provides zero isolation, no RBAC granularity, and no NetworkPolicy boundaries. Impossible to reason about blast radius of a misconfigured deployment.

Exceptions

AWX Operator — awx namespace

The AWX Operator upstream (github.com/ansible/awx-operator/config/default) hardcodes the awx namespace in its kustomization manifests. Deploying to archon-infra was attempted but produced namespace mismatch errors on serviceaccount, role, rolebinding, configmap, service, and deployment resources — the operator's own RBAC and resource references assume awx and cannot be overridden cleanly via a downstream kustomization namespace field.

Decision: accept awx as a fixed upstream constraint, not a local naming choice. AWX deploys to the awx namespace. All other workloads follow the archon-* domain-based pattern.

Reference: https://github.com/ansible/awx-operator/issues/2002

Consequences

  • All k3s workloads must declare namespace explicitly in their manifests — no implicit default
  • No deployments to the default namespace — it remains empty
  • Helm charts and ArgoCD Applications must target a specific archon-* namespace
  • Namespace creation managed via k3s manifests in archon-platform/k3s/namespaces/
  • Future multi-environment support (if needed) would layer environment labels on resources within domain namespaces, not create parallel namespace trees

References

  • IAM-0002 — IT/OT zone separation policy

Addendum — 2026-04-12

caneast-site1-node2 ServiceLB disabled (WI #175)

Observation: k3s Traefik's LoadBalancer service was claiming port 443 on caneast-site1-node2 (worker node), conflicting with the Nginx Proxy Manager (NRP) Docker container which also binds port 443 on that host. NRP went offline as a result.

Temporary fix applied:

kubectl label node caneast-site1-node2 svccontroller.k3s.cattle.io/enablelb=false
k3s-agent on caneast-site1-node2 and k3s on caneast-site1-node3 were restarted. Traefik LoadBalancer external IP now shows only REDACTED (caneast-site1-node3). NRP owns port 443 on caneast-site1-node2 cleanly.

Permanent fix (WI #175, backlog): caneast-site1-node2 should not run both a k3s worker and NRP long term. Options: 1. Move NRP to port 8443 and update all Cloudflare/DNS accordingly 2. Migrate all proxy hosts from NRP to Traefik IngressRoute resources and retire NRP

Label persists across reboots — the node label is stored in etcd and re-applied by the k3s control-plane on agent reconnect. No additional configuration required to make this permanent at the k3s level.

Status: Temporary fix is active. Permanent resolution tracked in WI #175.