ADR-0027 — External-observer evidence model

ADRsUpdated 2026-05-08 18:28 EDT5 min readEdit on GitHub ↗

5 sections··

ADR-0027 — External-observer evidence model #

Status: Accepted
Date: 2026-05-08
Deciders: Natan
Source: Conversation pinning SQA's value proposition. The

product is evidence of gaps between system claims and system behavior, not "tests passed."

Context

Context #

SQA is an external observer of running systems. It does not import the SUT's code; it only sees what the SUT wrote to the world (databases, object stores, log streams, analytics warehouses). Three properties follow:

Evidence weight comes from independence. A pass from a

verifier that imported the SUT's code proves nothing about production — the SUT's own writers are validating themselves. A pass from a verifier that read Mongo / S3 / ClickHouse / Loki as a third party does prove something.

The deliverable is the gap, not the result. Two distinct

audiences:

Operators / on-call read Result outcomes —

pass/warn/fail/error/skip — to decide whether to page someone now (ADR-0012/0020).

The SUT-owning team reads gap rows — auditable evidence

of specific divergences between what the SUT claims it does and what it actually does. They use these rows to prioritize fixes. These are related but separate outputs. A single SQA run produces both.

The gap is content-level, not presence-level. "S3 has a

robots.txt.gz key" is not evidence the SUT works. "The gzipped body parses as robots.txt with at least one User-agent: directive" is. The verify phase must assert at the level the SUT's promise is made.

Decision

Decision #

Every scenario in SQA produces two outputs per run:

Output 1 — Result tree (existing) #

The composite Result per ADR-0012/0020. Drives exit code, summary block, dashboard, on-call paging. One Result per phase; aggregated up the tree.

Output 2 — Gap log (new) #

Per-run, persisted alongside the run's logs: runs/<traceId>/gaps.json (single JSON array, append-on-write).

Each row:

interface GapRow {
  runId: string;          // SQA trace id
  ts: string;             // ISO 8601
  scenario: string;       // e.g. "snappy.domain-activation"
  claim: string;          // contract claim id, e.g. "C4"
  observed: string;       // what we saw
  expected: string;       // what the contract said
  evidence: {             // enough for the SUT team to reproduce
    excerpt?: string;     // first ~200 bytes of decoded artifact
  };
}

A gap row is appended only when:

The artifact under inspection was reachable (not skipped).
Its content failed to match the contract.

Specifically not on:

Skipped phases (env not configured, prereq not present).
Network errors (error outcome — probe-side bug, not a SUT

gap).

Missing artifacts where absence is contractually allowed

(e.g. domain has no robots.txt → no S3 key expected).

Phases emit gap rows from withGap(...) in lib/gap-log.ts, which both:

Returns the existing Result envelope for the run tree.
Appends to the per-run gap file.

The two outputs are co-emitted, never decoupled. A fail without a gap row is a misuse of the helper.

Consequences

Consequences #

Architectural #

New module src/lib/gap-log.ts with appendGap(row) and a

helper failWithGap(...) returning a Result and recording the gap atomically.

Each scenario directory gets a contract document at

docs/contracts/<system>/<scenario>.md (started by snappy/domain-activation.md). The document is the promise; the verifier asserts against it; gap rows reference the document's claim ids (C1, C2, …) so anyone reading a gap row can trace it back to the promise that was broken.

Result outcomes remain the operational signal. Gap rows are

the SUT-team-facing signal. They serve different audiences; conflating them is a category error.

Behavioural #

A run with pass=16/EXIT=0 but gap rows is not a green

run for the SUT team. It's a green run for the operators (run completed, no infra problems) and a list of fixes for the SUT team. The summary block prints both.

Gap rows accumulate across runs. The contract doc's gap-log

section is a manual rollup of recurring rows; PRs to the SUT reference rows by their runId+claim and the doc's row id.

Falsifiability #

A future contributor adds a verify phase that returns fail

without recording a gap → ADR-0027 has failed and the reviewer rejects the PR.

A verifier returns a gap row but no Result → also wrong

(gap-without-Result orphans the operational signal).

A verifier asserts presence-only ("file exists") and never

reads content → not yet a violation of this ADR per se, but it can never produce a meaningful gap row, so the verifier is vestigial. Reviewers cite the contract doc to push for a content-level assertion.

What this ADR does NOT do

What this ADR does NOT do #

It doesn't specify the evidence-storage layer. Per-run JSON is

the simplest viable form. A future PRD can durably persist gaps to ClickHouse (analyzable across runs) or to Linear (auto- filed tickets); the appendGap interface stays the same.

It doesn't change the Result envelope. Gap rows live next to

Results; they don't replace them.