Skip to content
SQA Cockpit

ADR-0052 - SQA scope includes value verification, not just operational checks

ADRsUpdated 3 min readEdit on GitHub ↗

ADR-0052 - SQA scope includes value verification, not just operational checks #

  • Status: Accepted
  • Date: 2026-05-28
  • Deciders: Natan
  • Supersedes: ADR-0007 (scope is

operational, not experience). ADR-0007's body is preserved per the immutability rule; its scope decision is widened by this ADR.

Context #

ADR-0007 (2026-05-07) drew a hard line: SQA does operational verification only - HTTP probes, database reachability, queue depth, schema consistency - and ruled audience-impression / quality verification ("is this job impressive?", "does the chat give good advice?") explicitly out of scope, to be built as a separate tool. That line was right for the failure mode it guarded against: the April-2026 tool that scored 75-86 GREEN while telling us nothing, because it conflated "the API responded" with "the user got value."

Since then the product has moved. PRD-18 shipped the metaintro-chat job-search probe, whose verifiers (src/systems/metaintro-chat/verifiers/) grade the relevancy and coverage of LLM output with an LLM-as-judge ensemble. ADR-0048 added the score outcome kind precisely to carry continuous, graded quality verdicts. The VALUE.md contract frames SQA's entire mission as verifying "the value a system owes its users." docs/contracts/metaintro-chat/job-search.md states a single client-facing, quality claim and a numeric verdict.

This is exactly the falsifiability clause ADR-0007 named: "if we add a probe that requires LLM-as-judge or human-judged quality scoring to a SQA component, this ADR is being violated and either the probe needs to move, or we consciously supersede this ADR." We choose to supersede it.

Decision #

SQA verifies whether a system delivers the value it claims to its users. That includes graded, LLM-judged quality of a system's output, when the quality is a falsifiable claim in a contract.

The distinction ADR-0007 protected is preserved, but redrawn - not as operational vs. quality, but as claim-grounded value verification vs. ungrounded audience impression:

  • In scope:
    • Operational verification (everything ADR-0007 listed: HTTP probes,

database/queue/storage reachability, version & deployment checks, schema consistency).

  • Value verification - graded quality of a system's output against a

falsifiable claim in a contract, judged by a named rubric (ADR-0033) and carried by the score outcome (ADR-0048). Example: "the jobs the chat returned are relevant to what the user asked for," scored by an LLM ensemble against a published rubric.

  • Out of scope (unchanged in spirit):
    • Ungrounded audience-impression QA - "does a first-time visitor leave

wanting to invest?", "is this design impressive?" - i.e. quality judgments with no contract, no client-stated claim, and no falsifier. These remain a separate concern.

The load-bearing rule is the claim: SQA may judge quality only when a contract states the quality as a falsifiable claim with a named falsifier and verification method. A quality score with no claim behind it is the synthetic drift ADR-0007 guarded against, and stays out.

Consequences #

  • Pro: The codebase as it actually ships (PRD-18, ADR-0048,

metaintro-chat) is now consistent with its scope ADR. The contradiction the glossary merge surfaced is resolved.

  • Pro: The guardrail against meaningless GREEN scores survives, relocated

to the right place - the falsifier. A score is only legitimate if a claim says what would refute it.

  • Pro: One tool, one vocabulary. We do not need a separate

"investor-readiness" tool to verify product value, provided the value is written as a falsifiable claim.

  • Con: "Value verification" is harder to keep honest than booleans. The

mitigation is the contract discipline (Keep Your Contract: every graded claim names its rubric, falsifier, and band) plus ADR-0033's rubric requirement and ADR-0048's rule that a score never escalates to fail.

  • Falsifiability: If a score-emitting probe ships without a

contract claim that names its falsifier and rubric, this ADR is being violated - the probe is producing an ungrounded quality judgment, and either the claim must be written or the probe must be removed.

See also #

operational-only scope and the failure mode it guarded against.

carries graded verdicts.

its rubric.

claim / score / verdict vocabulary this scope rests on.

Was this page helpful?