← All runslive api
The contract
kai’s promise: store what matters and recall it on demand — surface the right memory for the ask, ranked, with no relevant memory missed.
The claim SQA tests: The jobs the metaintro-chat search engine returned, in the chat thread, are relevant to what the user asked for. (claim C0).
This run tested the system against its contract, clause by clause. A single run can only witness some clauses; the rest stay UNKNOWN — never a faked pass.
1 pass · 0 fail · 7 unknown
- C0 MUSTPASSHeadline promiserelevancy score = 88/100 (pass-band ≥ 60)
- C1 MUSTUNKNOWNUser can sign inno 'login' step in this run
- C2 MUSTUNKNOWNUser can open a new threadno 'open-thread' step in this run
- C3 SHOULDUNKNOWNOnboarding gate completesno 'onboarding' step in this run
- C4 SHOULDUNKNOWNFilters from onboarding don't bias the queryno 'clear-filters' step in this run
- C5 MUSTUNKNOWNUser-typed query is what the engine seesno 'submit-query' step in this run
- C10 SHOULDUNKNOWNScore holds across rerunsneeds a sweep — a single run cannot witness this clause — needs a sweep
- C12 MAYUNKNOWNRun completes within budgetneeds a sweep — a single run cannot witness this clause — needs a sweep
TL;DR · 30-second primer
- ·KAI (SUT) ran 1 run on profile longmem-phase-a.
- ·Result: Memory Recall Index 88/100. Strong recall— see “Why this verdict” (each gap maps to a claim in the Contract).
1 ·THE VERDICT
the answer in one number30-day MRI history
KAI · MEMORY RECALL · RUN #23
Strong recall.
Run #23 of kai on profile longmem-phase-a for the query "MRI sweep — memory-recall". Memory Recall Index 88/100.
Verdict PASS: every step completed cleanly; nothing pulled the verdict down.
AI synthesis · openai/gpt-4o-mini
The system successfully completed the memory-recall task, achieving a strong JSI score of 88 out of 100. This high score indicates effective performance in recalling memory related to the MRI sweep. The entire process took 37.5 seconds, demonstrating efficiency in execution.
2 ·WHY THIS VERDICT
ranked by severity3 ·THE STORY
what went in, what came outInput
what the probe sent inQuery
5 ·SESSION RECORDING
watch what the probe sawSession recording
watch what the probe actually sawNo recording available for Metaintro.
6 ·RUN MECHANICS
provenance & reproducibilityDuration
37.54s
Steps
1
Judges
—
Commit
demo-seedStarted
2026-05-26 13:00 UTC
Trace
Evidence by step
every artifact, link, excerpt, row, metric & recording — grouped by the step that produced itNo evidence recorded for this run.
Evidence integrity
each artifact is SHA-256 hashed at capture — proof it is unmodifiedNo integrity manifest recorded for this run.
7 ·SYSTEM ANATOMY
which component drove the verdictEvery component held — no failure attributed.
Press ⌘K to search