Skip to content
SQA Cockpit
Documentation

What is SQA?

SQA - System Quality Assurance - is a platform that helps developers and product owners deliver better products by verifying the value a system actually delivers to its users.

You can't improve what you can't measure - so SQA makes value measurable. You write down what a system promises its users as a contract: a set of claims, each a promise you can verify - and when the value can't be measured directly, the claim says exactly how you'd know it was delivered. You make the promises; SQA is the independent witness that verifies whether they're kept.

Unlike unit tests or uptime monitors - which confirm a system matches its spec - SQA verifies whether the product actually delivers the valueit promised its users. Everything can be green while the product still isn't landing, because the spec itself can be wrong.

How it works

Four moves, from understanding the system to a report you can act on:

  1. 1Understand the SUT

    The System Under Test - say metaintro-chat: its components, how it works, and how to drive it, observe it, and collect evidence.

  2. 2Write the contract

    Capture what the system promises its users as claims - e.g. “a user can log in via email, password, or social auth,” “a job search returns jobs that match the query.” Each claim says how it’s verified.

  3. 3Build the scenario

    The technical part: a sequence of atomic steps that drive the live system, read the result, and verify it against the claim - with evidence. (Shown up close below.)

  4. 4Read the report

    What happened: each claim's result, who triggered the run, and the evidence underneath every one.

The scenario

The contract says what to verify. The scenario is how: a sequence of atomic steps that drive the live system, read what came back, and verify it against the claim - keeping evidence at every step. A real run, against the job-search contract for metaintro-chat (the example system under test):

metaintro-chat · job-search · “senior react engineer remote”

claimthe jobs the chat returned are relevant to what the user asked for.

drivelog in → finish onboarding → open a thread → submit the query
observecapture the job cards the assistant returned
verifyan LLM judge scores each returned job's relevancy, then averages them
verdictRelevancy67 / 100yellow
The report names which jobs scored low and why - backed by the run video, screenshots, the returned jobs, and the judge's per-job scores. A graded claim lands in a band: green strong · yellow partial · red weak. Other claims are simply pass / fail.

See it spelled out in the metaintro-chat contract, or watch live runs.

Start here

Or browse live runs to see SQA in action. Every doc is in the sidebar; press ⌘K to search.