ADR-0007 - sqa scope is operational, not experience
ADR-0007 - sqa scope is operational, not experience #
- Status: Superseded by ADR-0052
- Date: 2026-05-07
- Deciders: Natan
- Source: value-precision case study
001-metaintro-sqa-tool
Superseded (2026-05-28). ADR-0052 widens SQA's scope to include value verification — graded, LLM-judged quality of a system's output when it is written as a falsifiable claim in a contract. The body below is preserved per the ADR immutability rule and still explains the failure mode (meaningless GREEN scores) the scope guards against; the guardrail now lives in the falsifier requirement, not in a blanket ban on quality scoring. Ungrounded audience-impression QA (no claim, no falsifier) remains out of scope.
Context #
The original April 2026 SQA tool conflated two distinct concerns:
- Operational verification - "Is the system responding correctly?"
Answerable with deterministic tests against APIs, databases, queues.
- Audience-impression verification - "Does a first-time visitor
leave wanting to invest?" Answerable only by modelling the audience, defining the experience, and judging quality (manually or via LLM).
Both are valuable. Mixing them produced a tool that scored 75-86 GREEN without telling us anything about audience impression - the failure mode documented in docs/problem.md.
Decision #
This codebase covers operational verification only. Audience-impression verification is out of scope, even when adjacent.
Concretely:
- In scope: HTTP probes, database reachability, queue depth, S3
bucket health, version checks, deployment verification, schema consistency.
- Out of scope: Quality scoring of AI output, user-flow impression
assessment, "is this job impressive?", "does the chat give good advice?". These belong in a separate tool with explicit audience modelling.
Consequences #
- Pro: Probes return booleans / counts, which lets us write
unambiguous pass/fail logic. No false sense of confidence from GREEN scores that don't measure what matters.
- Pro: sqa can be run pre-deploy, post-deploy, in cron, and during
incidents - all the same code, all deterministic.
- Pro: Clear boundary for code review: "this PR adds an
audience-quality check" → reject, point to this ADR.
- Con: Investor-readiness verification still needs a separate
tool. Building one is not done by extending sqa.
- Falsifiability: If we add a probe that requires LLM-as-judge or
human-judged quality scoring to a sqa component, this ADR is being violated and either (a) the probe needs to move, or (b) we consciously supersede this ADR with a new one.
See also #
docs/problem.md- full problem statement and
falsifiability test.
- Synthetic drift - the failure mode this scope guards against.