Skip to content

DEV-0004 — Split CI and Public Deploy Pipelines for archon-docs

Field Value
ID DEV-0004
Date 2026-05-03
Status Accepted
Author Ben Peries
Class it/DEV
Tags pipeline, ci, cloudflare, archon-docs, wrangler
WI WI-410

Context

archon-docs used a single azure-pipelines.yml that ran DLP tests, sanitization, MkDocs build, and wrangler deploy to Cloudflare Pages on every push to main. On 2026-05-02, two PRs merged nine minutes apart triggered back-to-back wrangler deploys that hit the Cloudflare Pages API rate limit (HTTP 429 on GET /pages/projects/peries-ca-docs). All build steps passed in both runs; only the deploy step failed. A retryCountOnTaskFailure: 2 band-aid (WI-409, PR #415) was applied as interim mitigation on a branch, never merged to main.

The root cause is architectural: commit cadence was directly driving public deploy cadence. CI validation (does the content build cleanly?) and public publication (is this content ready for the portfolio audience?) are different operations with different failure modes and different trigger semantics.

Why this matters beyond the 429

  • Every internal plumbing commit (Ansible role fix, k3s manifest update, ADR addendum) triggers a public Cloudflare deploy under the monolithic model.
  • A deploy failure (rate limit, Cloudflare outage, wrangler regression) blocks the CI feedback signal — developers cannot tell whether the build itself is healthy.
  • Future: a ccagnt-marketing agent (WI-411) will evaluate diffs against a publication rubric and trigger deploys when content is portfolio-signal-worthy, replacing the manual operator step without losing intentionality.

I considered amending APPSEC-0002

APPSEC-0002 covers DLP content controls: what is sanitized, how it is verified, and what is excluded from the public build. It is a security-class ADR. Pipeline trigger semantics and CI/CD topology are a developer workflow decision, not a security decision. Amendment not appropriate.

Decision

Split into two pipeline files in the archon-docs repository root:

pipeline-ci.yml — CI pipeline (triggers on every push to main / every PR)

Validates that content is build-clean. Steps:

  1. Install Python dependencies (venv)
  2. DLP pytest — python3 -m pytest tests/ -v
  3. Clean previous build artifacts
  4. Sanitize — python3 sanitize.py
  5. Assert excluded files absent from build/docs/
  6. Verify sanitization — python3 verify-sanitization.py
  7. Pull device docs from archon-apps
  8. Re-verify sanitization (includes device docs)
  9. Copy assets and stylesheets
  10. Compute version from _index.md
  11. Inject version into mkdocs.yml copyright
  12. Generate AI context (_context.md)
  13. MkDocs build (--site-dir site/)
  14. Echo: "CI passed — docs are build-clean. Run pipeline-publish to deploy."

No Cloudflare credentials. No wrangler. Fast feedback on every commit.

pipeline-publish.yml — Publish pipeline (manual trigger only)

Deploys the built site to Cloudflare Pages. Steps:

  1. Install Python dependencies (venv)
  2. Clean previous build artifacts
  3. Sanitize — python3 sanitize.py
  4. Assert excluded files absent
  5. Verify sanitization — python3 verify-sanitization.py
  6. Pull device docs from archon-apps
  7. Re-verify sanitization
  8. Copy assets and stylesheets
  9. Compute version from _index.md
  10. Inject version into mkdocs.yml copyright
  11. Generate AI context
  12. MkDocs build
  13. Deploy via npx wrangler@4 pages deploy
  14. Echo: "Deployed to docs.peries.ca — $(date)"

Trigger: none. Manual dispatch only (ADO UI or az pipelines run). No schedule. No PR trigger. Requires explicit operator intent.

azure-pipelines.yml — retired

The original monolithic pipeline file is retired in the same commit. trigger: none is added; a header comment marks it retired. The file is retained temporarily for reference during ADO pipeline registration of the two new pipelines, then deleted.

Consequences

  • Commit cadence no longer drives deploy cadence.
  • Cloudflare API rate limit risk eliminated at normal sprint velocity (two PRs in 9 min no longer triggers two deploys).
  • A failed deploy no longer pollutes the CI signal.
  • Operator must explicitly trigger a public deploy. Adds one az pipelines run step per publication cycle.
  • Future: ccagnt-marketing agent (WI-411) eliminates the manual step without losing intentionality.

Alternatives considered

Option Verdict
retryCountOnTaskFailure only (WI-409 band-aid) Treats symptom, not cause. Commit cadence still drives deploy cadence. Rejected as long-term solution.
Tag-based deploy trigger Cleaner semantics than manual but adds git-tag discipline overhead for a solo operator. Deferred to future sprint if manual becomes burdensome.
Schedule-based deploy (e.g., nightly) Low friction but loses intentionality — deploys even when nothing has changed. Rejected.
Keep monolithic pipeline, add rate-limit backoff Belt-and-suspenders but doesn't decouple CI from deploy. Rejected.

References

  • WI-409: interim retryCountOnTaskFailure band-aid (superseded by this ADR)
  • WI-410: implementation WI for this change
  • WI-411: ccagnt-marketing agent design (follow-on, unblocked after WI-410)
  • APPSEC-0002: DLP controls for public docs content (unchanged by this ADR)
  • GOV-0001: WI-first branching policy