← All runslive api

The contract

kai’s promise: store what matters and recall it on demand — surface the right memory for the ask, ranked, with no relevant memory missed.

The claim SQA tests: The jobs the metaintro-chat search engine returned, in the chat thread, are relevant to what the user asked for. (claim C0).

This run tested the system against its contract, clause by clause. A single run can only witness some clauses; the rest stay UNKNOWN — never a faked pass.

1 pass · 0 fail · 7 unknown

C0 MUST
Headline promise
relevancy score = 88/100 (pass-band ≥ 60)
PASS
C1 MUST
User can sign in
no 'login' step in this run
UNKNOWN
C2 MUST
User can open a new thread
no 'open-thread' step in this run
UNKNOWN
C3 SHOULD
Onboarding gate completes
no 'onboarding' step in this run
UNKNOWN
C4 SHOULD
Filters from onboarding don't bias the query
no 'clear-filters' step in this run
UNKNOWN
C5 MUST
User-typed query is what the engine sees
no 'submit-query' step in this run
UNKNOWN
C10 SHOULD
Score holds across reruns
needs a sweep — a single run cannot witness this clause — needs a sweep
UNKNOWN
C12 MAY
Run completes within budget
needs a sweep — a single run cannot witness this clause — needs a sweep
UNKNOWN

TL;DR · 30-second primer

·KAI (SUT) ran 1 run on profile longmem-phase-a.
·Result: Memory Recall Index 88/100. Strong recall— see “Why this verdict” (each gap maps to a claim in the Contract).

1 ·THE VERDICT

the answer in one number

30-day MRI history

KAI · MEMORY RECALL · RUN #23

Strong recall.

Run #23 of kai on profile longmem-phase-a for the query "MRI sweep — memory-recall". Memory Recall Index 88/100.

Verdict PASS: every step completed cleanly; nothing pulled the verdict down.

AI synthesis · openai/gpt-4o-mini

The system successfully completed the memory-recall task, achieving a strong JSI score of 88 out of 100. This high score indicates effective performance in recalling memory related to the MRI sweep. The entire process took 37.5 seconds, demonstrating efficiency in execution.

2 ·WHY THIS VERDICT

ranked by severity

No gaps reported for this run.

3 ·THE STORY

what went in, what came out

Input

what the probe sent in

Query

Skills

(no skill inferred)

ESCO —

Industry

Computer Systems Design

NAICS 541512

Location

United States

ISO US

Education

Bachelor or equivalent

ISCED ISCED 6

5 ·SESSION RECORDING

watch what the probe saw

Session recording

watch what the probe actually saw

No recording available for Metaintro.

6 ·RUN MECHANICS

provenance & reproducibility

Duration

37.54s

Steps

Judges

—

Commit

demo-seed

Started

2026-05-26 13:00 UTC

Trace

Evidence by step

every artifact, link, excerpt, row, metric & recording — grouped by the step that produced it

No evidence recorded for this run.

Evidence integrity

each artifact is SHA-256 hashed at capture — proof it is unmodified

No integrity manifest recorded for this run.

7 ·SYSTEM ANATOMY

which component drove the verdict

Every component held — no failure attributed.

Press ⌘K to search