ADR-0027 — External-observer evidence model
ADR-0027 — External-observer evidence model #
- Status: Accepted
- Date: 2026-05-08
- Deciders: Natan
- Source: Conversation pinning SQA's value proposition. The
product is evidence of gaps between system claims and system behavior, not "tests passed."
Context
Context #
SQA is an external observer of running systems. It does not import the SUT's code; it only sees what the SUT wrote to the world (databases, object stores, log streams, analytics warehouses). Three properties follow:
- Evidence weight comes from independence. A
passfrom a
verifier that imported the SUT's code proves nothing about production — the SUT's own writers are validating themselves. A pass from a verifier that read Mongo / S3 / ClickHouse / Loki as a third party does prove something.
- The deliverable is the gap, not the result. Two distinct
audiences:
- Operators / on-call read Result outcomes —
pass/warn/fail/error/skip — to decide whether to page someone now (ADR-0012/0020).
- The SUT-owning team reads gap rows — auditable evidence
of specific divergences between what the SUT claims it does and what it actually does. They use these rows to prioritize fixes. These are related but separate outputs. A single SQA run produces both.
- The gap is content-level, not presence-level. "S3 has a
robots.txt.gz key" is not evidence the SUT works. "The gzipped body parses as robots.txt with at least one User-agent: directive" is. The verify phase must assert at the level the SUT's promise is made.
Decision
Decision #
Every scenario in SQA produces two outputs per run:
Output 1 — Result tree (existing) #
The composite Result per ADR-0012/0020. Drives exit code, summary block, dashboard, on-call paging. One Result per phase; aggregated up the tree.
Output 2 — Gap log (new) #
Per-run, persisted alongside the run's logs: runs/<traceId>/gaps.json (single JSON array, append-on-write).
Each row:
interface GapRow {
runId: string; // SQA trace id
ts: string; // ISO 8601
scenario: string; // e.g. "snappy.domain-activation"
claim: string; // contract claim id, e.g. "C4"
observed: string; // what we saw
expected: string; // what the contract said
evidence: { // enough for the SUT team to reproduce
excerpt?: string; // first ~200 bytes of decoded artifact
};
}A gap row is appended only when:
- The artifact under inspection was reachable (not skipped).
- Its content failed to match the contract.
Specifically not on:
- Skipped phases (env not configured, prereq not present).
- Network errors (
erroroutcome — probe-side bug, not a SUT
gap).
- Missing artifacts where absence is contractually allowed
(e.g. domain has no robots.txt → no S3 key expected).
Phases emit gap rows from withGap(...) in lib/gap-log.ts, which both:
- Returns the existing
Resultenvelope for the run tree. - Appends to the per-run gap file.
The two outputs are co-emitted, never decoupled. A fail without a gap row is a misuse of the helper.
Consequences
Consequences #
Architectural #
- New module
src/lib/gap-log.tswithappendGap(row)and a
helper failWithGap(...) returning a Result and recording the gap atomically.
- Each scenario directory gets a contract document at
docs/contracts/<system>/<scenario>.md (started by snappy/domain-activation.md). The document is the promise; the verifier asserts against it; gap rows reference the document's claim ids (C1, C2, …) so anyone reading a gap row can trace it back to the promise that was broken.
- Result outcomes remain the operational signal. Gap rows are
the SUT-team-facing signal. They serve different audiences; conflating them is a category error.
Behavioural #
- A run with
pass=16/EXIT=0but gap rows is not a green
run for the SUT team. It's a green run for the operators (run completed, no infra problems) and a list of fixes for the SUT team. The summary block prints both.
- Gap rows accumulate across runs. The contract doc's gap-log
section is a manual rollup of recurring rows; PRs to the SUT reference rows by their runId+claim and the doc's row id.
Falsifiability #
- A future contributor adds a verify phase that returns
fail
without recording a gap → ADR-0027 has failed and the reviewer rejects the PR.
- A verifier returns a gap row but no
Result→ also wrong
(gap-without-Result orphans the operational signal).
- A verifier asserts presence-only ("file exists") and never
reads content → not yet a violation of this ADR per se, but it can never produce a meaningful gap row, so the verifier is vestigial. Reviewers cite the contract doc to push for a content-level assertion.
What this ADR does NOT do
What this ADR does NOT do #
- It doesn't specify the evidence-storage layer. Per-run JSON is
the simplest viable form. A future PRD can durably persist gaps to ClickHouse (analyzable across runs) or to Linear (auto- filed tickets); the appendGap interface stays the same.
- It doesn't change the Result envelope. Gap rows live next to
Results; they don't replace them.
See also
See also #
- ADR-0012 — the
Result envelope this ADR composes with.
drive-then-verify scenario shape that produces gap rows.
PRD-07— the first scenario instrumented with gap rows.
PRD-07 was deleted on 2026-05-08; the live contract is at docs/contracts/snappy/domain-activation.md ↗.
— the first contract document; pattern for future scenarios.
- Charity Majors et al., Observability Engineering (O'Reilly
2022) ch. 5 — "the answer to 'is X working' is rarely a single question". Gap rows split the answer by audience.