← All runslive api

The contract

Metaintro Chat’s promise: tell us what you need — we return relevant, quality job results based on your ask, verified daily, no ghost jobs.

The claim SQA tests: The jobs returned by Metaintro Chat answer what the user asked for, judged by an LLM ensemble against the query and the user's profile. (claim C8).

This run tested the system against its contract, clause by clause. A single run can only witness some clauses; the rest stay UNKNOWN — never a faked pass.

9 pass · 0 fail · 4 unknown

C0 MUST
Headline promise
relevancy score = 70/100 (pass-band ≥ 60)
PASS
C1 MUST
User can sign in
login = pass
PASS
C2 MUST
User can open a new thread
open-thread = pass
PASS
C3 SHOULD
Onboarding gate completes
onboarding = pass
PASS
C4 SHOULD
Filters from onboarding don't bias the query
clear-filters = pass
PASS
C5 MUST
User-typed query is what the engine sees
submit-query = pass
PASS
C6 MUST
The chat returns job cards
wait-for-jobs = pass
PASS
C7 MUST
Job cards have required fields
no 'card-shape' step in this run
UNKNOWN
C8 MUST
Returned jobs are relevant to the query · HEADLINE
relevancy score = 70/100 (pass-band ≥ 60)
PASS
C9 SHOULD
All aspects of the query are covered
relevancy score = 70/100 (pass-band ≥ 60)
PASS
C10 SHOULD
Score holds across reruns
needs a sweep — a single run cannot witness this clause — needs a sweep
UNKNOWN
C11 SHOULD
Competitive vs LinkedIn / Indeed / Google
needs a sweep — a single run cannot witness this clause — needs a sweep
UNKNOWN
C12 MAY
Run completes within budget
needs a sweep — a single run cannot witness this clause — needs a sweep
UNKNOWN

TL;DR · 30-second primer

·Metaintro Chat (SUT) ran 1 run on behalf of 1 seeker (P1, Lena Park).
·The chat returned 10 jobs. The judge scored them.
·Result: Job-Seeker Index 70/100. Mostly relevant— see “Why this verdict” (each gap maps to a claim in the Contract).
·Compared to 3 competitors (LinkedIn / Indeed / Google) further down.

1 ·THE VERDICT

the answer in one number

30-day JSI history

METAINTRO CHAT · JSI · RUN #25

Mostly relevant.

Run #25 of metaintro-chat on profile P1 for the query "senior react engineer remote". Job-Seeker Index 70/100.

Verdict WARN: c2-relevancy outcome score; c3-coverage outcome score; legacy-composite outcome score; baselines baselines disabled (set captureBaselines: true to enable). All journey steps (login, open-thread, onboarding, clear-filters, submit-query, wait-for-jobs, observe, c1-job-card-shape) passed.

AI synthesis · openai/gpt-4o-mini

The system successfully returned job listings but received a warning due to degraded relevancy, scoring 70 out of 100. This lower score indicates that while ten jobs were provided, they did not closely match the query for a senior React engineer role. Notably, the results included positions like Senior Full Stack Developer and Fullstack Engineer, which may not align with the specific request for React expertise.

2 ·WHY THIS VERDICT

ranked by severity

3 ·THE STORY

what went in, what came out

Input

the same input was run against all 4 platforms

Query

Job-seeker profile

P1LenaMid-Senior IC Engineer P2MarcusCareer Switcher · Finance → Data P3AdaoraEarly Career / New Grad P4Wei-ChengSenior Specialist · ML Infra P5SofíaNon-Technical · Product Marketing

Skills

React.js, JavaScript

ESCO S6.0.2

Industry

Software Publishers

NAICS 511210

Location

Remote · United States

ISO US · remote=true

Education

Bachelor or equivalent

ISCED ISCED 6

Output · jobs returned

Title	Company	Location	Posted	Link
Job: Senior Full Stack Developer	an AI-powered platform	—	—	open ↗
Job: Fullstack Engineer	RYZ Labs	—	—	open ↗
Job: Staff Full Stack Engineer	Assured	—	—	open ↗
Job: Full-Stack Engineer	Elation	—	—	open ↗
Job: Full Stack Product Engineer	Vanta	—	—	open ↗
Job: Senior Frontend React Developer	Global	—	—	open ↗
Job: Senior Frontend Developer (React)	Capco	—	—	open ↗
Job: Senior Full-Stack Engineer	Human Agency	—	—	open ↗
Job: Senior React Native Software Engineer (Javascript)	Bouncy	—	—	open ↗
Job: Senior Full Stack Engineer	Cobalt AI	—	—	open ↗

4 ·THE BENCHMARK

vs. LinkedIn, Indeed, Google

Benchmark · 4 platforms × 7 axes

Metaintro (us)
LinkedIn
Indeed
Google

Capability matrix · platforms × axes

Platform	Recognition	Specificity	Reachability	Recency	Match quality	Hostility filter	Salary surface	Overall
Metaintro ·us								65
LinkedIn								59
Indeed								53
Google								53

5 ·SESSION RECORDING

watch the probe drive each platform

Session recording

watch what the probe actually saw

No recording available for Metaintro.

6 ·RUN MECHANICS

provenance & reproducibility

Duration

1m 30.1s

Steps

Judges

—

Commit

rerun-real

Started

2026-05-28 14:22 UTC

Trace

Evidence by step

every artifact, link, excerpt, row, metric & recording — grouped by the step that produced it

No evidence recorded for this run.

Evidence integrity

each artifact is SHA-256 hashed at capture — proof it is unmodified

No integrity manifest recorded for this run.

7 ·SYSTEM ANATOMY

which component drove the verdict

Every component held — no failure attributed.

Press ⌘K to search