ADR-0022 — Probe-fixture bootstrap and naming convention

ADRsUpdated 2026-05-08 13:27 EDT4 min readEdit on GitHub ↗

ADR-0022 — Probe-fixture bootstrap and naming convention #

Status: Accepted
Date: 2026-05-08
Deciders: Natan
Source: Friction during PRD-03 — operator had to manually create

cohorts in Snappy and copy ObjectIds into .env after every dev DB reset. Fixture rot is a documented anti-pattern in Sam Newman, Building Microservices (2nd ed., ch. 10) and the Google SRE book (ch. 17 "Production Probes").

Context #

PRD-03's domain-activation segment requires a tenant-scoped "Fortune 500 cohort" to drive a domain through. The original design pinned the cohort's MongoDB ObjectId via SQA_PROBE_COHORT_ID in .env. Two failure modes followed:

Dev DB resets invalidate the ObjectId. Every reset means

somebody opens the Snappy UI, recreates the cohort, copies the new ObjectId back into .env. SQA-on-dev becomes a manual ritual. Synthetic probes are supposed to survive environment churn — this design failed at exactly the moment SQA matters.

Names invented per call site drift. Without a convention,

every new fixture (cohort, domain, future tenant if Snappy ever grows one) gets named on the spot. Three contributors will pick three patterns. Discoverability via grep and cleanup via "delete everything that looks like a probe artifact" both rot.

We considered four alternatives (recorded in CHANGELOG and the parent conversation):

Probe creates its own fixtures. The probe owns lifecycle.
Discovery-by-name with a separate make seed-probe-fixtures step.
Dedicated probe DB schema preserved across resets.
Skip segment 2 on dev/localhost.

Option 1 is the only one that survives every failure mode and keeps the dev experience zero-touch. Options 2–4 push manual work onto humans, infra, or coverage. ADR-0007 explicitly draws the SQA scope at operational observation — and there's a clean exception: fixtures the probe owns are part of the probe, not of the system. Same way the S3 probe writes to _sqa-probe/<key> and cleans up after itself.

Decision #

Self-bootstrapping fixtures. The runner gets a dedicated segment between preflight (1) and domain-activation (3):

text

1   preflight              parallel — eight component probes
2   probe-fixtures         sequential — ensure SQA's fixtures exist
3   domain-activation      sequential — synthetic transaction

Step 2 runs only after preflight settles — fixture creation against a broken upstream is noise, not signal. The fixture-ensure component is idempotent: if the resource exists it returns the ID; if not, it creates and returns the ID. The ID flows to step 3 through a small in-process registry (src/lib/probe-registry.ts).

Naming convention. Every resource SQA creates in any system carries the prefix sqa-probe-. Single grep contract. No suffix, no encoding, no run-id baked into persistent fixtures.

Lifetime	Pattern	Example
Persistent	`sqa-probe-<resource>-<purpose>`	`sqa-probe-cohort-fortune-500`
Ephemeral	`sqa-probe-<resource>` (auto-id)	the probe domain at `auth0.com` etc.

Env-var contract.

SQA_PROBE_COHORT_ID becomes an optional override.
- Empty (default) → step 2 self-bootstraps the cohort named

sqa-probe-cohort-fortune-500. Found-or-created.

Non-empty → step 3 uses the supplied ID directly; step 2 still

runs and reports its findings, but the override wins.

SQA_PROBE_TENANT_ID and SQA_PROBE_TENANT_NAME are deleted.

Snappy doesn't have a tenant resource — what PRD-03 called a "tenant" was a model error. The Organization concept in Snappy is something else (an evaluated company, not a workspace).

Consequences #

Behavioural #

After a fresh DB reset, make run creates the cohort on the

first run and reuses it on every subsequent run. Zero manual setup. Zero ObjectIds in .env.

The probe segment that creates fixtures runs every time. If the

cohort already exists, the call is a single GET /cohorts plus a 100-cohort scan (snappy's REST has no slug filter today). The cost is bounded; the pattern is documented in ensure.ts.

grep "sqa-probe" against logs, dashboards, or DB dumps tells

any operator exactly what SQA created. Cleanup is one query: WHERE name LIKE 'sqa-probe-%'.

Architectural #

ADR-0007's operational not management boundary is preserved

with one explicit exception: SQA may create state that exists only so SQA can observe behavior, prefixed sqa-probe-, with SQA owning the lifecycle.

The runner gains a small in-process module

(src/lib/probe-registry.ts) for cross-segment state. Module- level Map; no DI machinery. Each make run is a fresh process, so cross-run pollution is impossible.

Future M2M scenarios (e.g. a probe that needs a probe-org for a

workflow test) follow the same pattern: a components/<thing>/ ensure.ts returning the ID, segment 2 runs it, registry entry picked up by later segments.

Falsifiability #

After a dev DB reset, make run should pass without any

manual UI work. If the operator has to touch Snappy first, this ADR has failed.

After 100 runs, exactly one sqa-probe-cohort-fortune-500

should exist (idempotency holds). If duplicates appear, the ensure step has a race or a slug mismatch.

If a future contributor adds a _test, qa-, or unprefixed

fixture, code review rejects it citing this ADR.