Skip to content

Pre-Commit Hooks — archon-docs

Pre-commit hooks catch DLP (data loss prevention) violations locally before a push reaches the CI pipeline.

What the hook does

Runs sanitize.py + verify-sanitization.py before every commit to archon-docs. If the DLP scan finds leaked IPs, UUIDs, node names, or credentials, the commit is blocked.

Install

Run once from the root of the archon-docs repo:

cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
set -e

# DLP pre-commit hook for archon-docs
# Runs sanitize.py then verify-sanitization.py.
# Commit is blocked if any internal strings are detected.

echo "[pre-commit] Running sanitization DLP check..."

if ! python3 sanitize.py > /dev/null 2>&1; then
  echo "[pre-commit] FAIL: sanitize.py exited non-zero"
  exit 1
fi

if ! python3 verify-sanitization.py; then
  echo "[pre-commit] FAIL: DLP leak detected — commit blocked"
  exit 1
fi

echo "[pre-commit] PASS — no DLP leaks detected"
EOF
chmod +x .git/hooks/pre-commit

What is checked

The same LEAK_PATTERNS checked by the ADO pipeline gate:

Pattern Example
RFC1918 IPs 192.168.2.x, 10.x.x.x
Node names caneast-site1-node2, caneast-site1-mqtt1
Hyphenated UUIDs Infisical project/identity IDs
Personal identifiers operator, caneast-site1-ai1, Ben Peries
Hardware identifiers CanEast AI Node, <device-model>, REDACTED
Service accounts ansible-svc-account
Credentials TOKEN=REDACTEDKEY=REDACTED password:
Network identifiers dmz-bridge-0, lan-bridge, port REDACTED
MQTT topics with internal prefix caneast/ot
ADO org URL dev.azure.com/caneast-platform

Excluded files

sanitize.py never copies these to build/docs/:

  • docs/_index.md — machine context index
  • docs/_context.md — AI portability file with full inventory
  • docs/internal/ — sensitive runbooks and identity details

The hook verifies these are absent from build/docs/ after sanitization runs.

Relationship to CI pipeline

The ADO pipeline runs the same checks on every push to main. The pre-commit hook is a local fast-fail — it does not replace the pipeline gate. Both must pass.

See ADR-0018 — Sanitization Verification Strategy and ADR-0031 — Public Docs Security Controls.