Documentation

What is SQA?

SQA - System Quality Assurance - is a platform that helps developers and product owners deliver better products by verifying the value a system actually delivers to its users.

You can't improve what you can't measure - so SQA makes value measurable. You write down what a system promises its users as a contract: a set of claims, each a promise you can verify - and when the value can't be measured directly, the claim says exactly how you'd know it was delivered. You make the promises; SQA is the independent witness that verifies whether they're kept.

Unlike unit tests or uptime monitors - which confirm a system matches its spec - SQA verifies whether the product actually delivers the valueit promised its users. Everything can be green while the product still isn't landing, because the spec itself can be wrong.

How it works

Four moves, from understanding the system to a report you can act on:

1Understand the SUT
The System Under Test - say metaintro-chat: its components, how it works, and how to drive it, observe it, and collect evidence.
2Write the contract
Capture what the system promises its users as claims - e.g. “a user can log in via email, password, or social auth,” “a job search returns jobs that match the query.” Each claim says how it’s verified.
3Build the scenario
The technical part: a sequence of atomic steps that drive the live system, read the result, and verify it against the claim - with evidence. (Shown up close below.)
4Read the report
What happened: each claim's result, who triggered the run, and the evidence underneath every one.

The scenario

The contract says what to verify. The scenario is how: a sequence of atomic steps that drive the live system, read what came back, and verify it against the claim - keeping evidence at every step. A real run, against the job-search contract for metaintro-chat (the example system under test):

metaintro-chat · job-search · “senior react engineer remote”

claimthe jobs the chat returned are relevant to what the user asked for.

drivelog in → finish onboarding → open a thread → submit the query

observecapture the job cards the assistant returned

verifyan LLM judge scores each returned job's relevancy, then averages them

verdictRelevancy67 / 100yellow

The report names which jobs scored low and why - backed by the run video, screenshots, the returned jobs, and the judge's per-job scores. A graded claim lands in a band: green strong · yellow partial · red weak. Other claims are simply pass / fail.

See it spelled out in the metaintro-chat contract, or watch live runs.

Start here

New here

Why SQA exists

The failure mode it catches: every check green, the product still not doing its job.

Learn the model

Concepts & how it works

The nine core nouns - SUT, contract, claim, scenario, step, run, outcome - and how they connect.

See what's verified

A real contract

What a live system promises its users, written as falsifiable claims a run checks.

Or browse live runs to see SQA in action. Every doc is in the sidebar; press ⌘K to search.