Skip to content
SQA Cockpit

ADR-0007 - sqa scope is operational, not experience

ADRsUpdated 2 min readEdit on GitHub ↗

ADR-0007 - sqa scope is operational, not experience #

Superseded (2026-05-28). ADR-0052 widens SQA's scope to include value verification — graded, LLM-judged quality of a system's output when it is written as a falsifiable claim in a contract. The body below is preserved per the ADR immutability rule and still explains the failure mode (meaningless GREEN scores) the scope guards against; the guardrail now lives in the falsifier requirement, not in a blanket ban on quality scoring. Ungrounded audience-impression QA (no claim, no falsifier) remains out of scope.

Context #

The original April 2026 SQA tool conflated two distinct concerns:

  1. Operational verification - "Is the system responding correctly?"

Answerable with deterministic tests against APIs, databases, queues.

  1. Audience-impression verification - "Does a first-time visitor

leave wanting to invest?" Answerable only by modelling the audience, defining the experience, and judging quality (manually or via LLM).

Both are valuable. Mixing them produced a tool that scored 75-86 GREEN without telling us anything about audience impression - the failure mode documented in docs/problem.md.

Decision #

This codebase covers operational verification only. Audience-impression verification is out of scope, even when adjacent.

Concretely:

  • In scope: HTTP probes, database reachability, queue depth, S3

bucket health, version checks, deployment verification, schema consistency.

  • Out of scope: Quality scoring of AI output, user-flow impression

assessment, "is this job impressive?", "does the chat give good advice?". These belong in a separate tool with explicit audience modelling.

Consequences #

  • Pro: Probes return booleans / counts, which lets us write

unambiguous pass/fail logic. No false sense of confidence from GREEN scores that don't measure what matters.

  • Pro: sqa can be run pre-deploy, post-deploy, in cron, and during

incidents - all the same code, all deterministic.

  • Pro: Clear boundary for code review: "this PR adds an

audience-quality check" → reject, point to this ADR.

  • Con: Investor-readiness verification still needs a separate

tool. Building one is not done by extending sqa.

  • Falsifiability: If we add a probe that requires LLM-as-judge or

human-judged quality scoring to a sqa component, this ADR is being violated and either (a) the probe needs to move, or (b) we consciously supersede this ADR with a new one.

See also #

falsifiability test.

Was this page helpful?