bugfix-newfeature-qa-ultra — usage & iteration guide

🟢 the 2 human moments 🔵 automatic phase 🟡 a mechanical guard 🟣 a place you iterate the skill

When to fire it

Hand it scrambled input and let it run. Two human touches: sign the EBO, accept the result.

Fire it when… What you give it Don't, when…

A bug-report chat went long and you want it driven to a merged, QA-proven fix The chat transcript (.jsonl) It's a trivial one-file edit — just do it

You have a rambling feature idea and want it built + QA'd hands-off The ramble (--text / chat) You want to plan ONE fuzzy feature from scratch → use plan-orchestrate (the sequential sibling)

Several scoped fixes need isolating, merging + live-QA-proving as one build The asks (it creates + merges its own branches) You want to merge someone else's pre-built branches — there is no such mode (single-purpose)

Fire it when…	What you give it	Don't, when…
A bug-report chat went long and you want it driven to a merged, QA-proven fix	The chat transcript (`.jsonl`)	It's a trivial one-file edit — just do it
You have a rambling feature idea and want it built + QA'd hands-off	The ramble (`--text` / chat)	You want to plan ONE fuzzy feature from scratch → use `plan-orchestrate` (the sequential sibling)
Several scoped fixes need isolating, merging + live-QA-proving as one build	The asks (it creates + merges its own branches)	You want to merge someone else's pre-built branches — there is no such mode (single-purpose)

The one upfront human moment — the EBO sign-off BLOCKING

The Expected-Behavior Oracle is the answer key the whole run is judged against. You sign it once, at the start. Until you do, nothing builds.

What you actually do: open the EBO (in EBO.md or its Notion page), fix anything wrong, add anything missing, delete rows that don't apply, set Status → 🟢 Signed off, and reply done. That reply is your signature — compile_ebo.py mechanically refuses (exit 7) to compile any QA slice until ebo.signed exists, so no builder can ever go green against an un-reviewed oracle. The skill never self-signs.

What runs between your two touches — the 9 phases

🔵 0–2 · Intake → EBO → Confirm

transcript→intent.json→/user-journeys EBO→🟢 sign

Parses type:"user" AND queued_command (the follow-ups a naive parser drops). Authors the oracle via /user-journeys, mirrors to Notion, blocks on your sign-off.

🔵 3 · Decompose

EBO rows→tasks→pick model→compile slices

God-module (flows.py/dash.py) touchers serialise; file-disjoint work fans out. Each slice is read-only + never-guesses a probe path.

🔵 4 · Fan-out

builder (worktree+TDD)→Sonnet QA twin→return + qa

One builder per task on its own branch; a Sonnet QA twin (no shared context) runs visual-qa-ultra on the slice + audits each test for "theater". board-card-qa during, visual-qa-ultra after.

🔵 5–7 · Converge → Final QA → Fix loop

bundle backup→--no-ff merge→whole-EBO QA→bug→fix→fresh QA

Sequential merges, suite green after each (red ⇒ STOP, never ship red). Final QA over the whole EBO catches cross-feature bugs; each fix is re-verified in a fresh QA context.

Caps: ≤3 concurrent builders · ≤4 product-fix attempts/row · ≤3 whole-build cycles · ≤5 question rounds/agent.

The heartbeat (long runs)

On a long run the orchestrator sets a 30-minute heartbeat (ScheduleWakeup(1800s)). On each beat it appends a checkpoint line and resumes from run_state.json if anything stalled. A background wait that times out while work is still running is not a stop — it re-issues. You can ignore the run until it pings you for the EBO sign-off (start) or the accept (end). Check progress any time with init_run.py status --run-id <id>.

The second human moment — reading the final report & accepting

The run ends at PENDING_REVIEW — it will not push until you accept.

In the report, look for… What it means

The verdict table (behavior → PASS / STATE_MISSING / BLOCKED → the one deciding fact) Every EBO row + the single observation that settled it. PASS only on a LIVE frame + a real state delta.

run-trust = scenario_confidence × oracle_coverage Trust in the asserted invariants — never "the system works". PASSED_WITH_GAPS whenever coverage < 100%.

residual human TODOs (bucket-d rows) The genuinely-can't-automate residue (e.g. a real Stripe LIVE charge) for you to finish by hand.

probe-path TODOs in a slice A state row the compiler refused to guess a path for — fill it to deepen the oracle (see "iterate" below).

In the report, look for…	What it means
The verdict table (behavior → PASS / STATE_MISSING / BLOCKED → the one deciding fact)	Every EBO row + the single observation that settled it. PASS only on a LIVE frame + a real state delta.
run-trust = scenario_confidence × oracle_coverage	Trust in the asserted invariants — never "the system works". `PASSED_WITH_GAPS` whenever coverage < 100%.
residual human TODOs (bucket-d rows)	The genuinely-can't-automate residue (e.g. a real Stripe LIVE charge) for you to finish by hand.
probe-path TODOs in a slice	A state row the compiler refused to guess a path for — fill it to deepen the oracle (see "iterate" below).

Accepting: when the report is clean, the skill announces the integration + feature branches and pushes only then — and main stays an explicit trigger. Nothing reaches a remote before you say so.

The mechanical guards (the locked Q1–Q10 decisions) 🟡

Guard What it prevents Where it lives

Never-guess-path compiler (Q2) False greens — a test passing while the data under it is wrong compile_ebo.py (known-path allowlist + _todo)

BLOCKING sign-off gate (Q7) Building against an un-reviewed oracle compile_ebo.py exit 7 until ebo.signed

QA twin can't self-sign + audits tests (Q4) "Test theater" (null/exists/assert True) going green QA twin wrapper (always Sonnet, no shared context)

Serialise god-module touchers (Q5) Two builders trashing flows.py/dash.py in parallel group_tasks.py

Bundle backup + suite-after-each + STOP-on-red (Q9) Shipping a red merge; an unrecoverable merge converge recipe (wrapper §E)

No push before accept (Q10) A remote getting an unreviewed build run_state.pushed, Phase 8

AUTOSEND on · sentinel-gate · send-blocklist · warmup-redact Emailing a real lead; flipping a live switch; leaking warmup mail safety invariants, every phase

Guard	What it prevents	Where it lives
Never-guess-path compiler (Q2)	False greens — a test passing while the data under it is wrong	`compile_ebo.py` (known-path allowlist + `_todo`)
BLOCKING sign-off gate (Q7)	Building against an un-reviewed oracle	`compile_ebo.py` exit 7 until `ebo.signed`
QA twin can't self-sign + audits tests (Q4)	"Test theater" (null/exists/`assert True`) going green	QA twin wrapper (always Sonnet, no shared context)
Serialise god-module touchers (Q5)	Two builders trashing `flows.py`/`dash.py` in parallel	`group_tasks.py`
Bundle backup + suite-after-each + STOP-on-red (Q9)	Shipping a red merge; an unrecoverable merge	converge recipe (wrapper §E)
No push before accept (Q10)	A remote getting an unreviewed build	`run_state.pushed`, Phase 8
AUTOSEND on · sentinel-gate · send-blocklist · warmup-redact	Emailing a real lead; flipping a live switch; leaking warmup mail	safety invariants, every phase

How to iterate on the skill itself 🟣

Every seam is a deterministic script with a pinned self-test. Change the constant, add a self-test assertion, re-run python3 scripts/selftest.py.

You want to… Do this

Add a bucket-(c) state-setup helper (a precondition with no UI, like neg_reply=true) state_setup.py scaffold --spec precond.json → fill the TODO click path (verify each selector live — never guess) → wire it into the QA twin's run before the live drive

Tune the Sonnet→Opus escalation thresholds Edit the trigger constants in pick_model.py decide() (≥4 files / ≥120 LOC / ≥5 rows / risk flags); the self-test cases pin current behavior

Stop the intake parser missing a new kind of ask, or leaking a secret Add to intake_extract.py SKILL_DUMP_PREFIXES / NOISE_TAG_RE / REDACTIONS + a self-test assertion so the miss can't recur

Make a new state field count as an "obvious" probe path Add it to compile_ebo.py DEFAULT_KNOWN_PATHS (or pass --known-paths) — this is what makes "obvious path" mechanical

Improve file resolution for decomposition Supply --surface-map from graphify path for exact resolution, or extend group_tasks.py GOD_MODULES

You want to…	Do this
Add a bucket-(c) state-setup helper (a precondition with no UI, like `neg_reply=true`)	`state_setup.py scaffold --spec precond.json` → fill the TODO click path (verify each selector live — never guess) → wire it into the QA twin's run before the live drive
Tune the Sonnet→Opus escalation thresholds	Edit the trigger constants in `pick_model.py` `decide()` (≥4 files / ≥120 LOC / ≥5 rows / risk flags); the self-test cases pin current behavior
Stop the intake parser missing a new kind of ask, or leaking a secret	Add to `intake_extract.py` `SKILL_DUMP_PREFIXES` / `NOISE_TAG_RE` / `REDACTIONS` + a self-test assertion so the miss can't recur
Make a new state field count as an "obvious" probe path	Add it to `compile_ebo.py` `DEFAULT_KNOWN_PATHS` (or pass `--known-paths`) — this is what makes "obvious path" mechanical
Improve file resolution for decomposition	Supply `--surface-map` from `graphify path` for exact resolution, or extend `group_tasks.py` `GOD_MODULES`

Proof it works (last validation run)

Check Result

Booked-card transcript replay (intake→EBO→decompose→compile→builder/QA contracts→PENDING_REVIEW) 22/22 PASS — extracted the queued V1/V2, classified the row (b)+(c), gate refused unsigned, never-guess TODO on the prose row, no AUTOSEND flip / no real deals / no push

Converge machinery vs the 4 existing branches (bundle backup → SHA-pin → sequential --no-ff → suite-after-each → no push) 12/12 PASS — real merges, suite green (392→407) after each merged state, isolated worktree cleaned up

Offline self-tests (scripts/selftest.py) ALL PASS — 6 unit + 7 end-to-end (both guards)

Check	Result
Booked-card transcript replay (intake→EBO→decompose→compile→builder/QA contracts→PENDING_REVIEW)	22/22 PASS — extracted the queued V1/V2, classified the row (b)+(c), gate refused unsigned, never-guess TODO on the prose row, no AUTOSEND flip / no real deals / no push
Converge machinery vs the 4 existing branches (bundle backup → SHA-pin → sequential --no-ff → suite-after-each → no push)	12/12 PASS — real merges, suite green (392→407) after each merged state, isolated worktree cleaned up
Offline self-tests (`scripts/selftest.py`)	ALL PASS — 6 unit + 7 end-to-end (both guards)

bugfix-newfeature-qa-ultra · regenerate this guide: python3 ~/.claude/skills/bugfix-newfeature-qa-ultra/references/build_guide.py · machine-facing design doc: ARCHITECTURE.md · context report: ebo-factory-context.pages.dev