Skip to content
SQA Cockpit
← All runslive api
The contract
Snappy’s promise: crawl and structure the web into a reliable, high-quality corpus — complete, fresh, and on-target for what was asked.
The claim SQA tests: The jobs the metaintro-chat search engine returned, in the chat thread, are relevant to what the user asked for. (claim C0).
This run tested the system against its contract, clause by clause. A single run can only witness some clauses; the rest stay UNKNOWN — never a faked pass.
1 pass · 0 fail · 7 unknown
  • C0 MUST
    Headline promise
    relevancy score = 64/100 (pass-band ≥ 60)
    PASS
  • C1 MUST
    User can sign in
    no 'login' step in this run
    UNKNOWN
  • C2 MUST
    User can open a new thread
    no 'open-thread' step in this run
    UNKNOWN
  • C3 SHOULD
    Onboarding gate completes
    no 'onboarding' step in this run
    UNKNOWN
  • C4 SHOULD
    Filters from onboarding don't bias the query
    no 'clear-filters' step in this run
    UNKNOWN
  • C5 MUST
    User-typed query is what the engine sees
    no 'submit-query' step in this run
    UNKNOWN
  • C10 SHOULD
    Score holds across reruns
    needs a sweep — a single run cannot witness this clause — needs a sweep
    UNKNOWN
  • C12 MAY
    Run completes within budget
    needs a sweep — a single run cannot witness this clause — needs a sweep
    UNKNOWN
TL;DR · 30-second primer
  • ·Snappy (SUT) ran 1 run on profile corpus-f500.
  • ·Result: Corpus Quality Index 64/100. Mixed— see “Why this verdict” (each gap maps to a claim in the Contract).

1 ·THE VERDICT

the answer in one number
30-day CQI history
SNAPPY · DOMAIN ACTIVATION · RUN #9

Mixed.

Run #9 of snappy on profile corpus-f500 for the query "CQI sweep — domain-activation". Corpus Quality Index 64/100.

Verdict FAIL: C5 Corpus dropped to YELLOW band — freshness regression compounded.

AI synthesis · openai/gpt-4o-mini

The system failed to perform its job during the domain-activation run, achieving a JSI score of 64 out of 100. The primary issues were a drop in corpus quality to the yellow band due to freshness regression and a warning about classification accuracy slipping because stale snapshots were used. These factors contributed to the overall failure of the domain-activation process.

2 ·WHY THIS VERDICT

ranked by severity
HARD

Corpus dropped to YELLOW band — freshness regression compounded

Expected
Corpus-mean CQI ≥ 70 (GREEN)
Observed
Corpus-mean CQI fell to 64 (YELLOW); 23% of orgs now serve snapshots >30d old as the cron stall entered its third day
Why it matters
Below the YELLOW floor the registry is no longer trustworthy for customer-facing freshness SLAs.
Recommended action· 2 sprints
Page on-call; recover freshness cursor and re-crawl the affected shards.
✓ verifiedjudge: freshness-checker + registry-cqi
SOFT

Classification accuracy slipped as stale snapshots fed the classifier

Expected
Industry-classification agreement ≥ 78% vs gold labels
Observed
Agreement dropped to 68% on orgs with stale snapshots
Why it matters
Stale homepage content misroutes industry classification downstream.
Recommended action· 1 sprint
Re-classify after freshness recovery.
✓ verifiedjudge: classify-industry

3 ·THE STORY

what went in, what came out

Input

what the probe sent in
Query
Skills
(no skill inferred)
ESCO
Industry
Computer Systems Design
NAICS 541512
Location
United States
ISO US
Education
Bachelor or equivalent
ISCED ISCED 6

5 ·SESSION RECORDING

watch what the probe saw

Session recording

watch what the probe actually saw
No recording available for Metaintro.

6 ·RUN MECHANICS

provenance & reproducibility
Duration
3m 48.9s
Steps
3
Judges
Commit
demo-seed
Started
2026-05-16 13:00 UTC
Trace

Evidence by step

every artifact, link, excerpt, row, metric & recording — grouped by the step that produced it

No evidence recorded for this run.

Evidence integrity

each artifact is SHA-256 hashed at capture — proof it is unmodified

No integrity manifest recorded for this run.

7 ·SYSTEM ANATOMY

which component drove the verdict

Verdict driven by Unattributed. The ringed, pulsing nodes are the components SQA attributes the failure to.

Press ⌘K to search