ADR-0052 - SQA scope includes value verification, not just operational checks
ADR-0052 - SQA scope includes value verification, not just operational checks #
- Status: Accepted
- Date: 2026-05-28
- Deciders: Natan
- Supersedes: ADR-0007 (scope is
operational, not experience). ADR-0007's body is preserved per the immutability rule; its scope decision is widened by this ADR.
Context #
ADR-0007 (2026-05-07) drew a hard line: SQA does operational verification only - HTTP probes, database reachability, queue depth, schema consistency - and ruled audience-impression / quality verification ("is this job impressive?", "does the chat give good advice?") explicitly out of scope, to be built as a separate tool. That line was right for the failure mode it guarded against: the April-2026 tool that scored 75-86 GREEN while telling us nothing, because it conflated "the API responded" with "the user got value."
Since then the product has moved. PRD-18 shipped the metaintro-chat job-search probe, whose verifiers (src/systems/metaintro-chat/verifiers/) grade the relevancy and coverage of LLM output with an LLM-as-judge ensemble. ADR-0048 added the score outcome kind precisely to carry continuous, graded quality verdicts. The VALUE.md ↗ contract frames SQA's entire mission as verifying "the value a system owes its users." docs/contracts/metaintro-chat/job-search.md states a single client-facing, quality claim and a numeric verdict.
This is exactly the falsifiability clause ADR-0007 named: "if we add a probe that requires LLM-as-judge or human-judged quality scoring to a SQA component, this ADR is being violated and either the probe needs to move, or we consciously supersede this ADR." We choose to supersede it.
Decision #
SQA verifies whether a system delivers the value it claims to its users. That includes graded, LLM-judged quality of a system's output, when the quality is a falsifiable claim in a contract.
The distinction ADR-0007 protected is preserved, but redrawn - not as operational vs. quality, but as claim-grounded value verification vs. ungrounded audience impression:
- In scope:
- Operational verification (everything ADR-0007 listed: HTTP probes,
database/queue/storage reachability, version & deployment checks, schema consistency).
- Value verification - graded quality of a system's output against a
falsifiable claim in a contract, judged by a named rubric (ADR-0033) and carried by the score outcome (ADR-0048). Example: "the jobs the chat returned are relevant to what the user asked for," scored by an LLM ensemble against a published rubric.
- Out of scope (unchanged in spirit):
- Ungrounded audience-impression QA - "does a first-time visitor leave
wanting to invest?", "is this design impressive?" - i.e. quality judgments with no contract, no client-stated claim, and no falsifier. These remain a separate concern.
The load-bearing rule is the claim: SQA may judge quality only when a contract states the quality as a falsifiable claim with a named falsifier and verification method. A quality score with no claim behind it is the synthetic drift ADR-0007 guarded against, and stays out.
Consequences #
- Pro: The codebase as it actually ships (PRD-18, ADR-0048,
metaintro-chat) is now consistent with its scope ADR. The contradiction the glossary merge surfaced is resolved.
- Pro: The guardrail against meaningless GREEN scores survives, relocated
to the right place - the falsifier. A score is only legitimate if a claim says what would refute it.
- Pro: One tool, one vocabulary. We do not need a separate
"investor-readiness" tool to verify product value, provided the value is written as a falsifiable claim.
- Con: "Value verification" is harder to keep honest than booleans. The
mitigation is the contract discipline (Keep Your Contract: every graded claim names its rubric, falsifier, and band) plus ADR-0033's rubric requirement and ADR-0048's rule that a score never escalates to fail.
- Falsifiability: If a
score-emitting probe ships without a
contract claim that names its falsifier and rubric, this ADR is being violated - the probe is producing an ungrounded quality judgment, and either the claim must be written or the probe must be removed.
See also #
- ADR-0007 - the superseded
operational-only scope and the failure mode it guarded against.
- ADR-0048 - the
scoreoutcome kind that
carries graded verdicts.
- ADR-0033 - LLM-as-judge must name
its rubric.
VALUE.md↗ - SQA's own value contract.docs/concepts/glossary.md- the contract /
claim / score / verdict vocabulary this scope rests on.