Skip to content
SQA Cockpit

Write a contract

UseUpdated 4 min readEdit on GitHub ↗

Write a contract #

Diátaxis form: how-to. Go from zero to a contract SQA can run against - the promise your system makes to its users, written as falsifiable claims.

A contract is where you write down what your system promises its users - not its spec, its value - as a small set of falsifiable claims. SQA reads the contract, drives your live system, and returns a verdict on whether each claim held. This guide takes you from a blank file to a contract a run checks.

When to use this #

You're adopting SQA for a system (a SUT) and you want it verified against what it owes its users, not just whether it responds. Write the contract first; the verifier code follows it.

Where it lives #

One file per scenario:

text
docs/contracts/<sut>/<scenario>.md

SQA writes contracts about the SUT, never for it (ADR-0032) - the team shipping the system owns the promise; SQA is the witness that tries to falsify it. The canonical example is docs/contracts/metaintro-chat/job-search.md.

The shape of a claim #

Each claim is one load-bearing sentence with five parts (full definitions in the glossary):

PartQuestion it answers
PromiseWhat does the system owe the user, in one falsifiable sentence?
StrengthMUST / SHOULD / MAY - how load-bearing is it?
StatusHypothesizedCommittedVerifiedBroken
Verification methodHow is it checked - Judge, Test, Demonstration, …?
FalsifierWhat would you see that proves the promise was broken?

The falsifier is the test of whether you even have a claim: if nothing observable could refute it, it's a wish, not a claim.

Steps #

  1. Name the client and the value. One sentence: who is served and what

they get. If you can't write it, the rest is premature - that's the litmus test. (The client is the role the promise is made to - a user, an agent, or another system.)

  1. Write the claim(s). One promise per claim. If two things could fail

independently, write two claims. Keep them in the user's language, not the API's.

  1. Choose a verification method. Pick the lightest one that can falsify the

claim. Graded quality (e.g. "the results are relevant") is a Judge - an LLM ensemble or rule scoring evidence against a named rubric. A boolean fact ("a job card renders") is a Test or Demonstration.

  1. State the falsifier. Write the concrete, observable event that means the

claim is wrong - who observes what against which state. Not "it breaks"; "≥2 of the top-5 returned jobs violate a hard constraint the user set."

  1. Set strength and status. MUST if breakage is a fail; start at

Hypothesized and advance to Committed/Verified as evidence accrues. Never edit an accepted claim in place - supersede it, so past runs still point at the same sentence.

  1. Wire the run. The scenario at src/systems/<sut>/<scenario>.ts drives

the live system and emits a Result per claim - a graded claim emits a score (0-1, banded; shown 0-100). See add-a-system for the scenario scaffolding.

Worked example #

The entire promise of the metaintro-chat job-search contract is one claim:

The jobs the metaintro-chat search engine returned, in the chat thread, are relevant to what the user asked for.
  • Strength: MUST · Verification: Judge (LLM ensemble, relevancy

rubric) · Verdict: a score 0-100, not pass/fail.

  • Falsifier: mean per-job relevancy below the band on a fresh run against

the user's actual profile, or ≥2 of the top-5 jobs violating a hard constraint (e.g. a "remote" query returning an on-site job).

That single claim is what the JSI probe verifies on every run. Read the full contract: metaintro-chat · job-search.

Anti-patterns #

  • A "contract" with no value statement. It's a TODO list. Write the

one-sentence promise first or mark it provisional.

  • A claim with no falsifier. It's a wish - nothing could prove it wrong.
  • A passive falsifier ("if it breaks we'll notice"). Name who runs it,

against what state, and what they observe.

  • Editing an accepted claim's text. Supersede instead; the change log is

what makes the contract auditable.

Next #

  • The vocabulary every claim uses: Glossary.
  • The long-form treatment of contracts, verdicts, and metrics:

Contract · Verdict · Metrics.

Was this page helpful?