ClientsFlow · Skill guide · 2026-06-23
The autonomous bug-fix / feature factory — how to use it, and how to iterate on it. Scrambled input → EBO → one human sign-off → built + merged + QA-proven on a safe branch → PENDING_REVIEW.
Hand it scrambled input and let it run. Two human touches: sign the EBO, accept the result.
| Fire it when… | What you give it | Don't, when… |
|---|---|---|
| A bug-report chat went long and you want it driven to a merged, QA-proven fix | The chat transcript (.jsonl) | It's a trivial one-file edit — just do it |
| You have a rambling feature idea and want it built + QA'd hands-off | The ramble (--text / chat) | You want to plan ONE fuzzy feature from scratch → use plan-orchestrate (the sequential sibling) |
| Several scoped fixes need isolating, merging + live-QA-proving as one build | The asks (it creates + merges its own branches) | You want to merge someone else's pre-built branches — there is no such mode (single-purpose) |
The Expected-Behavior Oracle is the answer key the whole run is judged against. You sign it once, at the start. Until you do, nothing builds.
EBO.md or its Notion page), fix anything wrong, add anything missing, delete rows that don't apply, set Status → 🟢 Signed off, and reply done. That reply is your signature — compile_ebo.py mechanically refuses (exit 7) to compile any QA slice until ebo.signed exists, so no builder can ever go green against an un-reviewed oracle. The skill never self-signs.type:"user" AND queued_command (the follow-ups a naive parser drops). Authors the oracle via /user-journeys, mirrors to Notion, blocks on your sign-off.flows.py/dash.py) touchers serialise; file-disjoint work fans out. Each slice is read-only + never-guesses a probe path.visual-qa-ultra on the slice + audits each test for "theater". board-card-qa during, visual-qa-ultra after.Caps: ≤3 concurrent builders · ≤4 product-fix attempts/row · ≤3 whole-build cycles · ≤5 question rounds/agent.
On a long run the orchestrator sets a 30-minute heartbeat (ScheduleWakeup(1800s)). On each beat it appends a checkpoint line and resumes from run_state.json if anything stalled. A background wait that times out while work is still running is not a stop — it re-issues. You can ignore the run until it pings you for the EBO sign-off (start) or the accept (end). Check progress any time with init_run.py status --run-id <id>.
The run ends at PENDING_REVIEW — it will not push until you accept.
| In the report, look for… | What it means |
|---|---|
| The verdict table (behavior → PASS / STATE_MISSING / BLOCKED → the one deciding fact) | Every EBO row + the single observation that settled it. PASS only on a LIVE frame + a real state delta. |
| run-trust = scenario_confidence × oracle_coverage | Trust in the asserted invariants — never "the system works". PASSED_WITH_GAPS whenever coverage < 100%. |
| residual human TODOs (bucket-d rows) | The genuinely-can't-automate residue (e.g. a real Stripe LIVE charge) for you to finish by hand. |
| probe-path TODOs in a slice | A state row the compiler refused to guess a path for — fill it to deepen the oracle (see "iterate" below). |
| Guard | What it prevents | Where it lives |
|---|---|---|
| Never-guess-path compiler (Q2) | False greens — a test passing while the data under it is wrong | compile_ebo.py (known-path allowlist + _todo) |
| BLOCKING sign-off gate (Q7) | Building against an un-reviewed oracle | compile_ebo.py exit 7 until ebo.signed |
| QA twin can't self-sign + audits tests (Q4) | "Test theater" (null/exists/assert True) going green | QA twin wrapper (always Sonnet, no shared context) |
| Serialise god-module touchers (Q5) | Two builders trashing flows.py/dash.py in parallel | group_tasks.py |
| Bundle backup + suite-after-each + STOP-on-red (Q9) | Shipping a red merge; an unrecoverable merge | converge recipe (wrapper §E) |
| No push before accept (Q10) | A remote getting an unreviewed build | run_state.pushed, Phase 8 |
| AUTOSEND on · sentinel-gate · send-blocklist · warmup-redact | Emailing a real lead; flipping a live switch; leaking warmup mail | safety invariants, every phase |
Every seam is a deterministic script with a pinned self-test. Change the constant, add a self-test assertion, re-run python3 scripts/selftest.py.
| You want to… | Do this |
|---|---|
Add a bucket-(c) state-setup helper (a precondition with no UI, like neg_reply=true) | state_setup.py scaffold --spec precond.json → fill the TODO click path (verify each selector live — never guess) → wire it into the QA twin's run before the live drive |
| Tune the Sonnet→Opus escalation thresholds | Edit the trigger constants in pick_model.py decide() (≥4 files / ≥120 LOC / ≥5 rows / risk flags); the self-test cases pin current behavior |
| Stop the intake parser missing a new kind of ask, or leaking a secret | Add to intake_extract.py SKILL_DUMP_PREFIXES / NOISE_TAG_RE / REDACTIONS + a self-test assertion so the miss can't recur |
| Make a new state field count as an "obvious" probe path | Add it to compile_ebo.py DEFAULT_KNOWN_PATHS (or pass --known-paths) — this is what makes "obvious path" mechanical |
| Improve file resolution for decomposition | Supply --surface-map from graphify path for exact resolution, or extend group_tasks.py GOD_MODULES |
| Check | Result |
|---|---|
| Booked-card transcript replay (intake→EBO→decompose→compile→builder/QA contracts→PENDING_REVIEW) | 22/22 PASS — extracted the queued V1/V2, classified the row (b)+(c), gate refused unsigned, never-guess TODO on the prose row, no AUTOSEND flip / no real deals / no push |
| Converge machinery vs the 4 existing branches (bundle backup → SHA-pin → sequential --no-ff → suite-after-each → no push) | 12/12 PASS — real merges, suite green (392→407) after each merged state, isolated worktree cleaned up |
Offline self-tests (scripts/selftest.py) | ALL PASS — 6 unit + 7 end-to-end (both guards) |