ADR-0022 — Probe-fixture bootstrap and naming convention
ADR-0022 — Probe-fixture bootstrap and naming convention #
- Status: Accepted
- Date: 2026-05-08
- Deciders: Natan
- Source: Friction during PRD-03 — operator had to manually create
cohorts in Snappy and copy ObjectIds into .env after every dev DB reset. Fixture rot is a documented anti-pattern in Sam Newman, Building Microservices (2nd ed., ch. 10) and the Google SRE book (ch. 17 "Production Probes").
Context #
PRD-03's domain-activation segment requires a tenant-scoped "Fortune 500 cohort" to drive a domain through. The original design pinned the cohort's MongoDB ObjectId via SQA_PROBE_COHORT_ID in .env. Two failure modes followed:
- Dev DB resets invalidate the ObjectId. Every reset means
somebody opens the Snappy UI, recreates the cohort, copies the new ObjectId back into .env. SQA-on-dev becomes a manual ritual. Synthetic probes are supposed to survive environment churn — this design failed at exactly the moment SQA matters.
- Names invented per call site drift. Without a convention,
every new fixture (cohort, domain, future tenant if Snappy ever grows one) gets named on the spot. Three contributors will pick three patterns. Discoverability via grep and cleanup via "delete everything that looks like a probe artifact" both rot.
We considered four alternatives (recorded in CHANGELOG and the parent conversation):
- Probe creates its own fixtures. The probe owns lifecycle.
- Discovery-by-name with a separate
make seed-probe-fixturesstep. - Dedicated probe DB schema preserved across resets.
- Skip segment 2 on dev/localhost.
Option 1 is the only one that survives every failure mode and keeps the dev experience zero-touch. Options 2–4 push manual work onto humans, infra, or coverage. ADR-0007 explicitly draws the SQA scope at operational observation — and there's a clean exception: fixtures the probe owns are part of the probe, not of the system. Same way the S3 probe writes to _sqa-probe/<key> and cleans up after itself.
Decision #
Self-bootstrapping fixtures. The runner gets a dedicated segment between preflight (1) and domain-activation (3):
1 preflight parallel — eight component probes
2 probe-fixtures sequential — ensure SQA's fixtures exist
3 domain-activation sequential — synthetic transactionStep 2 runs only after preflight settles — fixture creation against a broken upstream is noise, not signal. The fixture-ensure component is idempotent: if the resource exists it returns the ID; if not, it creates and returns the ID. The ID flows to step 3 through a small in-process registry (src/lib/probe-registry.ts).
Naming convention. Every resource SQA creates in any system carries the prefix sqa-probe-. Single grep contract. No suffix, no encoding, no run-id baked into persistent fixtures.
| Lifetime | Pattern | Example |
|---|---|---|
| Persistent | sqa-probe-<resource>-<purpose> | sqa-probe-cohort-fortune-500 |
| Ephemeral | sqa-probe-<resource> (auto-id) | the probe domain at auth0.com etc. |
Env-var contract.
SQA_PROBE_COHORT_IDbecomes an optional override.- Empty (default) → step 2 self-bootstraps the cohort named
sqa-probe-cohort-fortune-500. Found-or-created.
- Non-empty → step 3 uses the supplied ID directly; step 2 still
runs and reports its findings, but the override wins.
SQA_PROBE_TENANT_IDandSQA_PROBE_TENANT_NAMEare deleted.
Snappy doesn't have a tenant resource — what PRD-03 called a "tenant" was a model error. The Organization concept in Snappy is something else (an evaluated company, not a workspace).
Consequences #
Behavioural #
- After a fresh DB reset,
make runcreates the cohort on the
first run and reuses it on every subsequent run. Zero manual setup. Zero ObjectIds in .env.
- The probe segment that creates fixtures runs every time. If the
cohort already exists, the call is a single GET /cohorts plus a 100-cohort scan (snappy's REST has no slug filter today). The cost is bounded; the pattern is documented in ensure.ts.
grep "sqa-probe"against logs, dashboards, or DB dumps tells
any operator exactly what SQA created. Cleanup is one query: WHERE name LIKE 'sqa-probe-%'.
Architectural #
- ADR-0007's operational not management boundary is preserved
with one explicit exception: SQA may create state that exists only so SQA can observe behavior, prefixed sqa-probe-, with SQA owning the lifecycle.
- The runner gains a small in-process module
(src/lib/probe-registry.ts) for cross-segment state. Module- level Map; no DI machinery. Each make run is a fresh process, so cross-run pollution is impossible.
- Future M2M scenarios (e.g. a probe that needs a probe-org for a
workflow test) follow the same pattern: a components/<thing>/ ensure.ts returning the ID, segment 2 runs it, registry entry picked up by later segments.
Falsifiability #
- After a dev DB reset,
make runshould pass without any
manual UI work. If the operator has to touch Snappy first, this ADR has failed.
- After 100 runs, exactly one
sqa-probe-cohort-fortune-500
should exist (idempotency holds). If duplicates appear, the ensure step has a race or a slug mismatch.
- If a future contributor adds a
_test,qa-, or unprefixed
fixture, code review rejects it citing this ADR.
See also #
- ADR-0007 — the
scope boundary this ADR carves out an exception within.
- ADR-0012 — the
Result envelope every probe (including ensure) returns.
- ADR-0021 — the casing
rule the prefix follows (kebab-case lowercase).
- PRD-03 ↗ — the segment
this ADR makes self-bootstrapping.
- Sam Newman, Building Microservices (2nd ed.) ch. 10
"Semantic Monitoring" — synthetic transactions need stable fixtures.
- Google, SRE Book ch. 17 "Black-Box Monitoring" — probes
should be self-contained; environmental coupling is a bug.