Migrated from ADR-0043 on 2026-05-02 per ADR-0047. This source file is retained as a reference; the canonical content is in PLAT-0005.
PLAT-0005 — cert-manager with Let's Encrypt DNS-01 via Cloudflare¶
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-04-28 |
| Author | Ben Peries |
| Sources | ADR-0043 |
| Epic | E2 Platform Security & Hardening (WI-258) |
Context¶
k3s workloads (Ingress, Gateway API, internal services) require TLS certificates. The platform currently uses
Nginx Proxy Manager (NPM) on caneast-site1-node2 to issue and renew a wildcard *.peries.ca certificate via Let's
Encrypt DNS-01 with Cloudflare. This approach:
- Is manual-to-renew if NPM is unavailable.
- Cannot issue per-hostname certs for k3s Ingress resources without exporting the wildcard private key into Kubernetes Secrets by hand.
- Creates a dependency between caneast-site1-node2 (Docker host) and k3s cluster TLS.
A Kubernetes-native certificate lifecycle manager is needed to issue, rotate, and attach TLS certs directly to cluster workloads without exporting keys or relying on external tooling.
Decision¶
Install cert-manager v1.20.2 (Helm release archon-cert-manager, namespace cert-manager) as the
certificate lifecycle manager for the k3s cluster.
Use Let's Encrypt DNS-01 challenge via Cloudflare for automated certificate issuance and renewal. The
Cloudflare API token is injected into the cluster by ESO (IAM-0005) from Infisical path
archon-platform/prod/k3s/CERT_MANAGER_CF_TOKEN — no token is committed to Git.
Two ClusterIssuer resources are maintained:
| Issuer | ACME server | Purpose |
|---|---|---|
letsencrypt-staging |
acme-staging-v02.api.letsencrypt.org | Development / testing (untrusted CA, unlimited issuance) |
letsencrypt-prod |
acme-v02.api.letsencrypt.org | Production workloads (trusted CA, rate-limited) |
Phase 1 scope (this ADR)¶
Per-hostname certs only. The wildcard *.peries.ca cert remains in NPM on caneast-site1-node2 for reverse-proxy
services not yet migrated to k3s Ingress. No wildcard cert is issued via cert-manager in Phase 1.
Phase 2 (future)¶
Migrate wildcard issuance to cert-manager; decommission NPM's ACME renewal. A follow-up WI will track this.
Options Considered¶
Option A: Continue using NPM wildcard + manual Kubernetes Secret copy (rejected)¶
- Manual drift risk: Secret must be re-copied on every 90-day renewal.
- No audit trail on the Kubernetes Secret; NPM renewal logs are separate from cluster state.
- Does not scale to multiple per-hostname certs.
Option B: cert-manager with HTTP-01 challenge (rejected for DNS-01 setup)¶
- Requires publicly reachable ingress on port 80 per hostname.
- Incompatible with internal-only hostnames and
*.peries.ca(not publicly routed). - DNS-01 works for both public and private hostnames and supports both zones uniformly.
Option C: cert-manager with DNS-01 via Cloudflare API token (selected)¶
- Fully automated issuance and renewal (90-day certs, renews at 30-day mark).
- No public ingress required — challenge solved via Cloudflare API.
- Works for both
peries.caandperies.cazones. - Token scoped to minimum permissions: Zone DNS Edit + Zone Read, two zones only.
- Token never committed to Git — pulled from Infisical by ESO at runtime.
Rationale¶
DNS-01 via Cloudflare is the least-invasive challenge type for this homelab: no port forwarding, no public exposure of challenge endpoints. cert-manager is the CNCF-standard solution for Kubernetes TLS lifecycle management and integrates natively with Ingress and Gateway API resources.
The two-issuer pattern (staging + prod) is a cert-manager best practice. Staging allows unlimited testing
issuance without burning Let's Encrypt production rate limits. All new cert workflows should be validated
against staging before switching issuerRef to prod.
ESO integration (IAM-0005) ensures the Cloudflare token follows the platform's single-secrets-manager
principle (IAM-0001) with automatic rotation support: updating the token in Infisical propagates to the
cluster within the 1-hour refreshInterval.
Consequences¶
- All k3s Ingress and Certificate resources should reference
letsencrypt-stagingduring development,letsencrypt-prodwhen promoted. - Cert-manager's Prometheus ServiceMonitor is enabled; scraping is handled by the existing kube-prometheus-stack (OBS-0001).
- Token rotation is a follow-up action: create a new Cloudflare token, update Infisical, ESO propagates automatically within 1h. No cluster restarts required.
- Phase 2 wildcard migration to cert-manager requires a follow-up WI and ADR addendum.
installCRDs(deprecated) replaced withcrds.enabled: truein values.yaml.
References¶
- IAM-0001 — Infisical secrets management
- IAM-0005 — External Secrets Operator (ESO injects Cloudflare token into cluster)
- OBS-0001 — IT observability stack (Prometheus ServiceMonitor scraping)
- cert-manager docs: https://cert-manager.io/docs/
- Cloudflare DNS-01 solver: https://cert-manager.io/docs/configuration/acme/dns01/cloudflare/
- Helm chart: jetstack/cert-manager v1.20.2