Skip to content
SQA Cockpit

ADR-0012 — System checks are pure functions returning a Result envelope

ADRsUpdated 5 min readEdit on GitHub ↗
4 sections··

ADR-0012 — System checks are pure functions returning a Result envelope #

  • Status: Accepted
  • Date: 2026-05-07
  • Deciders: Natan
  • Supersedes (in spirit): the "throw at the segment boundary" guidance

in docs/guides/add-a-segment.md (segments now aggregate into a Result; they do not throw).

Context

Context #

Every layer of SQA — components, segments, system flows — is a check. Until now we encoded check outcomes inconsistently:

  • Components (src/components/<system>/ready.ts) returned boolean.

true meant "good"; false meant anything else — auth failure, unreachable, wrong response shape, parse error, all collapsed into one bit.

  • Segments (src/systems/snappy-api/preflight.ts) throw new Error(...)

on false. Same input, two different control-flow shapes (return vs throw) depending on outcome.

  • The top-level runner caught those throws in a .catch() and exited

non-zero.

Three concrete defects fell out of this:

  1. Lost diagnostic signal. A 403 from S3 (production rejected our

IAM) and a DNS timeout (we couldn't reach S3 at all) both became false. Different remediations; same return value.

  1. Impure functions. A check that throws on one input and returns

on another isn't a pure function — it has two output channels. That makes composition awkward and testing brittle.

  1. No way to express degraded-but-not-blocking. Loki returning

/ready ok with zero recent labels, OpenRouter cost spiking above an alert threshold — both want to say "the system answered, but something's off." boolean can't express it.

Four mature peer systems were surveyed before deciding:

SystemStates used
IETF Health Check Response Format draft (inadarei)pass, warn, fail
Kubernetes probesSuccess, Failure, Unknown
JUnit / TestNGpassed, failed, error, skipped, aborted
Terraform plugin diagnosticsError, Warning

JUnit's split between failed (assertion mismatch) and error (unexpected exception) is the one SQA needed and didn't have.

Decision

Decision #

Every check at every layer is a pure function returning a Result envelope. No throws. No side effects beyond logging. Same input produces the same output shape, always.

The contract lives at src/lib/result.ts:

ts
export type Outcome = "pass" | "warn" | "fail" | "error" | "skip";

export interface Result {
  outcome: Outcome;
  name: string;
  reason?: string;
  context?: Record<string, unknown>;
  children?: Result[];
}

The five outcomes #

OutcomeWhen to use
passThe check ran and the system is healthy by the check's contract.
warnThe check ran, system answered, answer is degraded but not blocking.
failThe check ran cleanly and the system answered wrongly (auth denied, wrong shape, missing required value).
errorThe check could not run to completion — exception, timeout, parse failure.
skipThe check was deliberately not executed (env not configured, feature-flagged off, prerequisite not met).

fail vs error is the key distinction: the system said no vs we couldn't ask. The remediations are different and the envelope must preserve the difference.

Discriminant field name: outcome, not status #

HTTP status codes appear constantly in context payloads. Naming the discriminant status would put two unrelated statuses nested in one record — a readability landmine. outcome names the verdict; status is reserved for HTTP evidence.

Composite results: recursive #

Every Result may carry a children array. A leaf check is a Result with no children; a segment is a Result whose children are leaf Results; a system flow is a Result whose children are segments. One schema, one walker.

This matches the IETF health-check checks field shape and the Kubernetes pod-condition shape. The alternative (a separate Composite type) bifurcates the consumer for no benefit.

Aggregation rule #

A composite's outcome is derived from its children, by severity:

text
pass  <  skip  <  warn  <  fail  <  error

The composite takes the worst child's outcome, with one promotion: composites whose worst child is skip promote to warn (a partially or fully unverified tree is not a healthy tree). The first non-pass child's reason is pinned in the composite's reason so a glance at the parent tells the headline story; the full children array stays under .children.

The aggregation function lives at src/lib/result.ts as aggregate(name, children, context?).

Process exit code #

The top-level runner inspects the root Result and sets the exit code via exitCodeFor(r):

text
pass         -> 0
warn / skip  -> 0   (run completed; nothing is broken)
fail / error -> 1   (run is not green)

Cron and CI watch the exit code. The JSON payload carries the tree for humans and tooling.

Consequences

Consequences #

  • Pro: Every check is testable in isolation by its return value.

No try/catch around the call site.

  • Pro: A run produces one Result tree. The same tree drives the

exit code, the JSON output, and (future) report renderers.

  • Pro: fail vs error lets a postmortem skip past noise. A 403

from S3 (fail, fix IAM) and a DNS timeout (error, fix the network) no longer look the same.

  • Pro: skip is a first-class state. Loki being unconfigured in

localhost stops being a silent false.

  • Con: Every probe signature changed from Promise<boolean> to

Promise<Result>. Migration is mechanical (every existing return false already had enough log context to upgrade) but touched all eight components plus segment, system, and the top-level runner.

  • Con: Composite trees can grow deep. Mitigated by the fixed

schema — one walker handles any depth.

  • Falsifiability: Revisit if (a) a real check needs an outcome

outside the five (suggest one before adding), or (b) a real caller wants to throw on fail instead of inspecting the envelope (the runner is the only place that should react to outcomes; a probe caller that wants to short-circuit should check result.outcome and decide).

See also

See also #

the component recipe (updated to use Result).

the segment recipe (updated: aggregate, never throw).

PRD-01's level-3 contract (recipe section updated).

— the closest peer-spec.

— the source of the fail vs error distinction.

Was this page helpful?