Migrated from ADR-0034 on 2026-05-02 per ADR-0047. Source file retained with deprecation banner at
docs/adr/0034-k3s-cp-migration-caneast-site1-node4.md.
PLAT-0003 — k3s Control-Plane Migration to caneast-site1-node4¶
| Field | Value |
|---|---|
| Status | Implemented |
| Date | 2026-04-19 |
| Author | Ben Peries |
| WI | WI-246 |
| Sources | ADR-0034 |
Context¶
The Archon k3s cluster was initialised on caneast-site1-node3 (REDACTED, 16 GB RAM, Intel Xeon CanEast Server) as the sole control-plane node. caneast-site1-node4 (REDACTED, 32 GB RAM, Intel i5-9500T) has been provisioned and joined as a worker as of 2026-04-19. caneast-site1-node4 is the higher-capacity node and is a better long-term home for the control-plane, freeing caneast-site1-node3 to focus on KVM (OPNsense VM) and Infisical (Docker Compose, separate from k3s).
The cluster uses k3s v1.34.6+k3s1 with SQLite as the embedded datastore (state.db / state.db-shm / state.db-wal). No embedded etcd is in use. AWX runs in the awx namespace with PVCs on the local-path provisioner, pinning storage to caneast-site1-node3 local disk.
Decision¶
Migrate the k3s control-plane from caneast-site1-node3 to caneast-site1-node4 by:
- Stopping k3s on caneast-site1-node3 cleanly (verify inactive + 5 s WAL flush).
- Archiving the server state (db, tls, cred, token dirs) and transferring to caneast-site1-node4 via SSH pipe.
- Installing k3s server on caneast-site1-node4 without
--cluster-init(which would reinitialise the cluster and overwrite restored SQLite state). The restored token and TLS material are used directly. - Adding
--tls-sanflags for REDACTED and caneast-site1-node4 to ensure cert validity from new IPs. - Removing caneast-site1-node3 from the cluster (kubectl delete node) and reinstalling it as a k3s agent pointing at caneast-site1-node4:6443.
- Updating kubeconfig on CanEast AI Node WSL (REDACTED) to point at REDACTED:6443. The kubernetes MCP server (mcp-server-kubernetes) resolves via KUBECONFIG, so no .mcp.json changes are required.
AWX downtime is accepted. AWX PVCs remain on caneast-site1-node3 local-path storage. After caneast-site1-node3 rejoins as a worker, AWX pods reschedule back onto caneast-site1-node3 automatically (affinity by PVC locality). Estimated AWX downtime: 15–30 min.
Rollback¶
If the caneast-site1-node4 API does not respond within 10 minutes of k3s server start
(checked at 2 min / 5 min / 10 min via systemctl status, server logs, and
curl -k https://REDACTED:[REDACTED]/livez):
- Restart k3s on caneast-site1-node3 (
sudo systemctl start k3s) — state is intact. - Uninstall any partial k3s server install on caneast-site1-node4.
- Reinstall caneast-site1-node4 as an agent pointing at caneast-site1-node3:[REDACTED] (restore Phase 1 state).
Consequences¶
- caneast-site1-node4 becomes the k3s API endpoint:
https://REDACTED:6443 - caneast-site1-node3 transitions to worker; KVM and Infisical are unaffected.
- All kubeconfig references and any hardcoded cluster endpoint references must be updated to REDACTED.
- Future HA migration (embedded etcd, 3+ control-plane nodes) will supersede this ADR and require a new ADR at that time.
- AWX PVCs remain local-path on caneast-site1-node3; migrating them to a shared storage solution (e.g., Longhorn) is deferred to a future ADR.
Alternatives Considered¶
Keep caneast-site1-node3 as control-plane indefinitely — rejected; caneast-site1-node4 is the better-resourced node and caneast-site1-node3 is already loaded with KVM + Infisical.
HA embedded etcd (3 control-plane nodes) — deferred; requires a third control-plane node and a new ADR. Not needed at current scale.
References¶
- PLAT-0002 — k3s namespace design (updated separately to reflect new CP node)
- WI-246 — caneast-site1-node4 baseline and k3s join
- k3s SQLite datastore docs: https://docs.k3s.io/datastore/embedded-sqlite
Addendum — 2026-04-21¶
Implemented 2026-04-19 (PR #270, #276). Verified in WI-295 audit 2026-04-21. Cluster state at verification: caneast-site1-node4 (CP), caneast-site1-node2 + caneast-site1-node3 + caneast-site1-node5 (workers), all v1.34.6+k3s1, SQLite datastore confirmed via kine.sock. Future HA migration (SQLite to embedded etcd) tracked as WI-299.