ADR-0012 — System checks are pure functions returning a Result envelope
ADR-0012 — System checks are pure functions returning a Result envelope #
- Status: Accepted
- Date: 2026-05-07
- Deciders: Natan
- Supersedes (in spirit): the "throw at the segment boundary" guidance
in docs/guides/add-a-segment.md ↗ (segments now aggregate into a Result; they do not throw).
Context
Context #
Every layer of SQA — components, segments, system flows — is a check. Until now we encoded check outcomes inconsistently:
- Components (
src/components/<system>/ready.ts) returnedboolean.
true meant "good"; false meant anything else — auth failure, unreachable, wrong response shape, parse error, all collapsed into one bit.
- Segments (
src/systems/snappy-api/preflight.ts)throw new Error(...)
on false. Same input, two different control-flow shapes (return vs throw) depending on outcome.
- The top-level runner caught those throws in a
.catch()and exited
non-zero.
Three concrete defects fell out of this:
- Lost diagnostic signal. A 403 from S3 (production rejected our
IAM) and a DNS timeout (we couldn't reach S3 at all) both became false. Different remediations; same return value.
- Impure functions. A check that throws on one input and returns
on another isn't a pure function — it has two output channels. That makes composition awkward and testing brittle.
- No way to express degraded-but-not-blocking. Loki returning
/ready ok with zero recent labels, OpenRouter cost spiking above an alert threshold — both want to say "the system answered, but something's off." boolean can't express it.
Four mature peer systems were surveyed before deciding:
| System | States used |
|---|---|
| IETF Health Check Response Format draft (inadarei) | pass, warn, fail |
| Kubernetes probes | Success, Failure, Unknown |
| JUnit / TestNG | passed, failed, error, skipped, aborted |
| Terraform plugin diagnostics | Error, Warning |
JUnit's split between failed (assertion mismatch) and error (unexpected exception) is the one SQA needed and didn't have.
Decision
Decision #
Every check at every layer is a pure function returning a Result envelope. No throws. No side effects beyond logging. Same input produces the same output shape, always.
The contract lives at src/lib/result.ts ↗:
export type Outcome = "pass" | "warn" | "fail" | "error" | "skip";
export interface Result {
outcome: Outcome;
name: string;
reason?: string;
context?: Record<string, unknown>;
children?: Result[];
}The five outcomes #
| Outcome | When to use |
|---|---|
pass | The check ran and the system is healthy by the check's contract. |
warn | The check ran, system answered, answer is degraded but not blocking. |
fail | The check ran cleanly and the system answered wrongly (auth denied, wrong shape, missing required value). |
error | The check could not run to completion — exception, timeout, parse failure. |
skip | The check was deliberately not executed (env not configured, feature-flagged off, prerequisite not met). |
fail vs error is the key distinction: the system said no vs we couldn't ask. The remediations are different and the envelope must preserve the difference.
Discriminant field name: outcome, not status #
HTTP status codes appear constantly in context payloads. Naming the discriminant status would put two unrelated statuses nested in one record — a readability landmine. outcome names the verdict; status is reserved for HTTP evidence.
Composite results: recursive #
Every Result may carry a children array. A leaf check is a Result with no children; a segment is a Result whose children are leaf Results; a system flow is a Result whose children are segments. One schema, one walker.
This matches the IETF health-check checks field shape and the Kubernetes pod-condition shape. The alternative (a separate Composite type) bifurcates the consumer for no benefit.
Aggregation rule #
A composite's outcome is derived from its children, by severity:
pass < skip < warn < fail < errorThe composite takes the worst child's outcome, with one promotion: composites whose worst child is skip promote to warn (a partially or fully unverified tree is not a healthy tree). The first non-pass child's reason is pinned in the composite's reason so a glance at the parent tells the headline story; the full children array stays under .children.
The aggregation function lives at src/lib/result.ts as aggregate(name, children, context?).
Process exit code #
The top-level runner inspects the root Result and sets the exit code via exitCodeFor(r):
pass -> 0
warn / skip -> 0 (run completed; nothing is broken)
fail / error -> 1 (run is not green)Cron and CI watch the exit code. The JSON payload carries the tree for humans and tooling.
Consequences
Consequences #
- Pro: Every check is testable in isolation by its return value.
No try/catch around the call site.
- Pro: A run produces one
Resulttree. The same tree drives the
exit code, the JSON output, and (future) report renderers.
- Pro:
failvserrorlets a postmortem skip past noise. A 403
from S3 (fail, fix IAM) and a DNS timeout (error, fix the network) no longer look the same.
- Pro:
skipis a first-class state. Loki being unconfigured in
localhost stops being a silent false.
- Con: Every probe signature changed from
Promise<boolean>to
Promise<Result>. Migration is mechanical (every existing return false already had enough log context to upgrade) but touched all eight components plus segment, system, and the top-level runner.
- Con: Composite trees can grow deep. Mitigated by the fixed
schema — one walker handles any depth.
- Falsifiability: Revisit if (a) a real check needs an outcome
outside the five (suggest one before adding), or (b) a real caller wants to throw on fail instead of inspecting the envelope (the runner is the only place that should react to outcomes; a probe caller that wants to short-circuit should check result.outcome and decide).
See also
See also #
src/lib/result.ts↗ — the contract.docs/guides/add-a-component.md↗ —
the component recipe (updated to use Result).
the segment recipe (updated: aggregate, never throw).
PRD-01's level-3 contract (recipe section updated).
— the closest peer-spec.
— the source of the fail vs error distinction.