Skip to content

IT Node Onboarding Golden Path

Four-step process for bringing a new physical node into the Archon fleet.

autoinstall  →  bootstrap.yml  →  it-baseline-all (AWX)  →  site.yml (AWX)

Step 1 — OS Install (autoinstall)

Boot the node from the Archon autoinstall USB. The cloud-init user-data:

  • Installs Ubuntu (current fleet version — see Inventory)
  • Creates the operator local admin account with your SSH key
  • Sets hostname, locale, timezone
  • Expands LVM to full disk

After install the node will reboot and be reachable at its DHCP address. Reserve a static IP on the Bell Giga Hub before proceeding.


Step 2 — Bootstrap (bootstrap.yml)

Runs as operator with password auth. Creates the ansible-svc-account service account so AWX can reach the node for all subsequent automation.

Add node to inventory

In ansible/inventories/it/hosts.yml, add the node to new_nodes:

new_nodes:
  hosts:
    <hostname>:
      ansible_host: <ip>
      ansible_port: 22
      ansible_user: operator
      ansible_python_interpreter: /usr/bin/python3

Run bootstrap

ansible-playbook -i ansible/inventories/it/hosts.yml \
  ansible/playbooks/it/bootstrap.yml \
  --limit new_nodes \
  --ask-pass --ask-become-pass

What it does:

  • Creates ansible-svc-account user (/bin/bash shell, home directory)
  • Creates /home/ansible-svc-account/.ssh with mode 0700
  • Deploys ~/.ssh/ansible-svc-account.pub from the control node as authorized_keys
  • Writes /etc/sudoers.d/ansible-svc-account (NOPASSWD:ALL, validated by visudo)

Move node to permanent group

After bootstrap succeeds, move the host from new_nodes to its target group (nodes, docker_hosts, etc.) and update ansible_user to ansible-svc-account.


Step 3 — Baseline (AWX: it-baseline-all)

Run the it-baseline-all AWX job template, scoped to the new node.

What it applies (roles/common + security roles):

  • Hostname, timezone, NTP
  • Base packages
  • SSH hardening (ssh_hardening role — moves SSH to port REDACTED)
  • fail2ban, ufw
  • node_exporter (Prometheus metrics)
  • Power management (sleep/suspend targets masked, lid-switch actions disabled)

After this run: - SSH port is now 2222 - Update ansible_port in hosts.yml for the node


Step 4 — Services (AWX: site.yml)

Run the site.yml AWX job template for the node's role:

Role Playbook Adds
Docker host site.yml Docker, Portainer
KVM host kvm.yml KVM/libvirt bridges
Monitoring monitoring.yml Grafana, InfluxDB, Telegraf

Post-onboarding checklist

  • [ ] Static IP reserved (Bell Giga Hub — or OPNsense once active)
  • [ ] hosts.yml updated — correct group, ansible_port: 2222, ansible_user: ansible-svc-account
  • [ ] Node removed from new_nodes group
  • [ ] Device page created in docs/devices/<hostname>.md
  • [ ] docs/reference/inventory.md updated
  • [ ] docs/reference/network.md DNS rewrite added (<hostname>.home)
  • [ ] k3s worker join (if applicable): kubectl get nodes shows Ready

Reference

  • Inventory: ansible/inventories/it/hosts.yml
  • Bootstrap playbook: ansible/playbooks/it/bootstrap.yml
  • Baseline playbook: ansible/playbooks/it/baseline.yml
  • Site playbook: ansible/playbooks/it/site.yml
  • Service account variable: roles/common/defaults/main.ymlcommon_ansible_user