Write a contract
Write a contract #
Diátaxis form: how-to. Go from zero to a contract SQA can run against - the promise your system makes to its users, written as falsifiable claims.
A contract is where you write down what your system promises its users - not its spec, its value - as a small set of falsifiable claims. SQA reads the contract, drives your live system, and returns a verdict on whether each claim held. This guide takes you from a blank file to a contract a run checks.
When to use this #
You're adopting SQA for a system (a SUT) and you want it verified against what it owes its users, not just whether it responds. Write the contract first; the verifier code follows it.
Where it lives #
One file per scenario:
docs/contracts/<sut>/<scenario>.mdSQA writes contracts about the SUT, never for it (ADR-0032) - the team shipping the system owns the promise; SQA is the witness that tries to falsify it. The canonical example is docs/contracts/metaintro-chat/job-search.md.
The shape of a claim #
Each claim is one load-bearing sentence with five parts (full definitions in the glossary):
| Part | Question it answers |
|---|---|
| Promise | What does the system owe the user, in one falsifiable sentence? |
| Strength | MUST / SHOULD / MAY - how load-bearing is it? |
| Status | Hypothesized → Committed → Verified → Broken |
| Verification method | How is it checked - Judge, Test, Demonstration, …? |
| Falsifier | What would you see that proves the promise was broken? |
The falsifier is the test of whether you even have a claim: if nothing observable could refute it, it's a wish, not a claim.
Steps #
- Name the client and the value. One sentence: who is served and what
they get. If you can't write it, the rest is premature - that's the litmus test. (The client is the role the promise is made to - a user, an agent, or another system.)
- Write the claim(s). One promise per claim. If two things could fail
independently, write two claims. Keep them in the user's language, not the API's.
- Choose a verification method. Pick the lightest one that can falsify the
claim. Graded quality (e.g. "the results are relevant") is a Judge - an LLM ensemble or rule scoring evidence against a named rubric. A boolean fact ("a job card renders") is a Test or Demonstration.
- State the falsifier. Write the concrete, observable event that means the
claim is wrong - who observes what against which state. Not "it breaks"; "≥2 of the top-5 returned jobs violate a hard constraint the user set."
- Set strength and status.
MUSTif breakage is a fail; start at
Hypothesized and advance to Committed/Verified as evidence accrues. Never edit an accepted claim in place - supersede it, so past runs still point at the same sentence.
- Wire the run. The scenario at
src/systems/<sut>/<scenario>.tsdrives
the live system and emits a Result per claim - a graded claim emits a score (0-1, banded; shown 0-100). See add-a-system ↗ for the scenario scaffolding.
Worked example #
The entire promise of the metaintro-chat job-search contract is one claim:
The jobs the metaintro-chat search engine returned, in the chat thread, are relevant to what the user asked for.
- Strength:
MUST· Verification:Judge(LLM ensemble, relevancy
rubric) · Verdict: a score 0-100, not pass/fail.
- Falsifier: mean per-job relevancy below the band on a fresh run against
the user's actual profile, or ≥2 of the top-5 jobs violating a hard constraint (e.g. a "remote" query returning an on-site job).
That single claim is what the JSI probe verifies on every run. Read the full contract: metaintro-chat · job-search.
Anti-patterns #
- A "contract" with no value statement. It's a TODO list. Write the
one-sentence promise first or mark it provisional.
- A claim with no falsifier. It's a wish - nothing could prove it wrong.
- A passive falsifier ("if it breaks we'll notice"). Name who runs it,
against what state, and what they observe.
- Editing an accepted claim's text. Supersede instead; the change log is
what makes the contract auditable.
Next #
- The vocabulary every claim uses: Glossary.
- The long-form treatment of contracts, verdicts, and metrics:
- See it run: a real contract.