Skip to content

OT-0005 OT Grafana Dashboard Taxonomy

Field Value
ID OT-0005
Date 2026-05-02
Status Proposed
Deciders Ben Peries
Migrated from OT-0009 (2026-04-14)

Context

The OT Grafana instance (on caneast-site1-node2:3002) hosts dashboards for OT telemetry, alarms, and historian data. As the number of sensor nodes and metric streams grows, an unstructured dashboard list becomes operationally difficult to navigate. A consistent folder hierarchy and UID scheme is needed before the dashboard count exceeds ~10.

Decision

Folder Hierarchy (5 tiers)

Folder UID Audience Contents
archon-platform archon-platform All Platform home; links to OT and IT Grafana
ot-operations ot-operations Operators Live sensor status, active alarms
ot-engineering ot-engineering Engineers Calibration, debug, raw signal views
ot-infrastructure ot-infrastructure Platform Node health, MQTT broker, InfluxDB
ot-historian ot-historian All Long-term trends, seasonal comparisons

archon-platform folder contains the org home dashboard (ot-home-archon) which is the default landing page for the OT Grafana instance.

UID Pattern

ot-{tier}-{node}-{purpose}

Examples:

UID Dashboard
ot-operations-snr01-sumppit Live sump-pit status (snr01)
ot-historian-snr01-level-trend Annual level trend (snr01)
ot-engineering-snr01-raw Raw signal debug (snr01)

Panel Pattern — Event-Driven Data

OT metrics are sparse (state-change events, not fixed-interval polling). The correct Grafana panel pattern for event-driven data is:

// Combines: last known value before window + all events within window
union(
  tables: [
    from(bucket: "homelab") |> range(start: lastBefore) |> last(),
    from(bucket: "homelab") |> range(start: windowStart, stop: windowEnd)
  ]
)

Use stepAfter interpolation for state-change metrics (flood, door_state) so the panel holds the last known value until the next event rather than interpolating.

Options Considered

Option A: Flat dashboard list (rejected)

No folder structure. Works below ~10 dashboards; breaks as node count grows. Navigation requires scrolling a flat list with no audience segmentation.

Option B: Per-node folders (rejected)

One folder per sensor node (snr01/, snr02/). Cross-node comparison dashboards have no natural home. Folder proliferation at O(nodes) rather than O(roles).

Option C: Role-based 5-tier hierarchy (selected)

Aligns with ISA-95 role separation (operators vs. engineers vs. platform admins). Folder count is fixed at 5 regardless of node count. Cross-node dashboards (e.g., all-zone alarm summary) belong naturally in ot-operations.

Rationale

The 5-tier hierarchy mirrors enterprise SCADA display conventions without requiring formal ISA-101 compliance. Separating operator, engineering, and historian views prevents dashboard clutter in the primary operations view. The UID pattern makes dashboards addressable by code (Ansible, Terraform) without relying on numeric IDs that change on import/export cycles.

The event-driven panel pattern avoids the "gaps between points" problem that occurs when standard Grafana time-series panels assume fixed-interval data.

Consequences

  • All new OT dashboards must be placed in one of the five folders.
  • Dashboard UIDs must follow the ot-{tier}-{node}-{purpose} pattern.
  • The archon-platform folder and ot-home-archon dashboard are the entry point; navigation links must be maintained when new dashboards are added.
  • State-change metrics must use stepAfter interpolation and the union+lastBefore query pattern to avoid misleading gaps in the time series.

References