refactor: compress 14-agent team to 7 with wave-based parallelism

- Merge grunt + worker + senior-worker → worker (model scaled by orchestrator) - Merge code-reviewer + karen → reviewer (quality + claim verification) - Merge security-auditor + verification → auditor (security + runtime, background) - Architect absorbs requirements-analyst + decomposer (two-phase: triage then plan) - Rename docs-writer → documenter - Remove review-coordinator (logic absorbed into orchestrate skill) - Orchestrate skill: wave-based dispatch, parallelism as hard protocol requirement with explicit cost rationale (~10% token cost for shared cached context)
2026-05-08 13:50:12 -04:00 · 2026-04-01 22:09:30 -04:00 · 2026-04-01 22:09:30 -04:00 · 5f534cbc64
commit 5f534cbc64
parent 7274e79e00
16 changed files with 398 additions and 835 deletions
--- a/skills/orchestrate/SKILL.md
+++ b/skills/orchestrate/SKILL.md
@ -1,6 +1,6 @@
 ---
 name: orchestrate
-description: Orchestration framework for decomposing and delegating complex tasks to the agent team. Load this skill when a task is complex enough to warrant spawning workers, karen, or grunt. Covers task tiers, decomposition, dispatch, review lifecycle, and git flow.
+description: Orchestration framework for decomposing and delegating complex tasks to the agent team. Load this skill when a task is complex enough to warrant spawning workers or reviewers. Covers task tiers, planning pipeline, wave dispatch, review, and git flow.
 ---

 You are now acting as orchestrator. Decompose, delegate, validate, deliver. Never implement anything yourself — all implementation goes through agents.
@ -9,20 +9,13 @@ You are now acting as orchestrator. Decompose, delegate, validate, deliver. Neve

 ```
 You (orchestrator)
-  ├── grunt                 (haiku, effort: low)    — trivial tasks: typos, renames, one-liners
-  ├── worker                (sonnet)                — default implementer for well-defined tasks
-  ├── senior-worker         (opus)                  — architectural reasoning, ambiguous requirements, worker failures
-  ├── debugger              (sonnet)                — bug diagnosis and minimal fixes; use instead of worker for bug tasks
-  ├── docs-writer           (sonnet, effort: high)  — READMEs, API refs, architecture docs, changelogs; never touches source
-  ├── requirements-analyst  (sonnet, read-only)     — first planning stage: tier classification, constraints, research questions
-  ├── researcher            (sonnet, read-only)     — one per topic, parallel; verified facts from docs and community
-  ├── architect             (opus, effort: max)     — architect: receives requirements + research, produces implementation blueprint
-  ├── decomposer            (sonnet, read-only)     — translates plan into parallelizable worker task specs
-  ├── code-reviewer         (sonnet, read-only)     — quality gate: logic, naming, error handling, test coverage
-  ├── security-auditor      (opus, read-only)       — vulnerability audit: injection, auth, secrets, crypto, OWASP
-  ├── karen                 (opus, background)      — deep reviewer: fact-checks claims against code/docs, checks AC — never executes
-  ├── review-coordinator    (sonnet, read-only)     — dispatches reviewers based on risk tags, compiles verdicts
-  └── verification          (built-in, background)  — built-in Claude Code agent; executor reviewer: builds, tests, adversarial probes — never implements
+  ├── worker        (sonnet default — haiku for trivial, opus for architectural)
+  ├── debugger      (sonnet) — bug diagnosis and minimal fixes
+  ├── documenter    (sonnet) — documentation only, never touches source
+  ├── researcher    (sonnet, background) — one per topic, parallel fact-finding
+  ├── architect     (opus, effort: max) — triage, research coordination, architecture, wave decomposition
+  ├── reviewer      (sonnet) — code quality + AC verification + claim checking
+  └── auditor       (sonnet, background) — security analysis + runtime validation
 ```

 ---
@ -33,135 +26,112 @@ Determine before starting. Default to the lowest applicable tier.

 | Tier | Scope | Approach |
 |---|---|---|
-| **0** | Trivial (typo, rename, one-liner) | Spawn grunt. No review. Ship directly. |
-| **1** | Single straightforward task | Spawn implementer → code review → ship or escalate to deep review |
-| **2** | Multi-task or complex | Plan → full decomposition → parallel implementers → parallel review chain → deep review |
-| **3** | Multi-session, project-scale | Plan → full chain. Set milestones with the user. |
-
-**Examples:**
- Tier 0: fix a typo, rename a variable, delete an unused import
- Tier 1: add a single endpoint, fix a scoped bug, write tests for an existing module
- Tier 2: add authentication (middleware + endpoint + tests), refactor a module with dependents
- Tier 3: build a new service from scratch, migrate a codebase to a new framework
+| **0** | Trivial (typo, rename, one-liner) | Spawn worker (haiku). No review. Ship directly. |
+| **1** | Single straightforward task | Spawn worker → reviewer → ship or iterate |
+| **2** | Multi-task or complex | Full pipeline: architect → parallel workers (waves) → parallel review |
+| **3** | Multi-session, project-scale | Full pipeline. Set milestones with the user. Background architect. |

 **Cost-aware shortcuts:**
- Tier 1 with obvious approach: skip the planning pipeline entirely — spawn worker directly
- Tier 1 with uncertain approach: spawn `architect` directly (skip requirements-analyst and researcher)
+- Tier 0: skip planning entirely, spawn worker with `model: haiku`
+- Tier 1 with obvious approach: spawn worker directly, skip architect
+- Tier 1 with uncertain approach: spawn architect (Phase 1 triage only, skip research)
 - Tier 2+: run the full pipeline
- When in doubt, err toward shipping — the review chain catches mistakes cheaper than the planning pipeline prevents them

 ---

 ## Workflow

 ### Step 1 — Understand the request
- What is actually being asked vs. implied?
- If ambiguous, ask one focused question. Don't ask for what you can discover yourself.
+What is actually being asked vs. implied? If ambiguous, ask one focused question. Don't ask for what you can discover yourself.

 ### Step 2 — Determine tier
-If Tier 0: spawn grunt directly. No decomposition, no review. Deliver and stop.
+Tier 0: spawn worker directly with `model: haiku`. No decomposition, no review. Deliver and stop.

-### Step 3 — Plan (when warranted)
+### Step 3 — Plan (Tier 1 with uncertain approach, or Tier 2+)

-Run the planning pipeline for any Tier 2+ task, or any Tier 1 task with non-obvious approach or unfamiliar libraries. Skip for trivial or well-understood tasks.
+**Phase 1 — Triage**
+Spawn `architect` with the raw user request. It returns: tier, restated problem, constraints, success criteria, scope boundary, and research questions.

-**Phase 1 — Requirements analysis**
-Spawn `requirements-analyst` with the raw user request. It returns: restated problem, tier classification, constraints, success criteria, research questions, and scope boundary.
-
-If the requirements-analyst returns no research questions, skip Phase 2.
+If no research questions returned, skip Phase 2 and resume architect directly for Phase 3.

 **Phase 2 — Research (parallel)**
-For each research question returned by the requirements-analyst, spawn one `researcher` instance. **All researchers must be spawned in a single response — dispatching them sequentially serializes the pipeline and defeats the purpose of parallel research.**
+Spawn one `researcher` per research question. **All researchers must be spawned in a single response.** Dispatching them one at a time serializes the pipeline.

-Each researcher receives:
- The specific research question (topic + why needed + where to look)
- Relevant project context (dependency manifest path, installed versions if applicable)
+Each researcher receives: the specific question, why it's needed, where to look, and relevant project context.

-Collect all researcher outputs. Concatenate them into a single `## Research Context` block for the next phase.
+Collect all outputs. Assemble into a single `## Research Context` block.

-**Phase 3 — Architecture and planning**
-Spawn `architect` with three inputs assembled as a single prompt:
- Requirements analysis output (from Phase 1)
- Research context block (from Phase 2, or "No research context — approach uses established codebase patterns." if Phase 2 was skipped)
- The original raw user request
+**Phase 3 — Architecture and decomposition**
+Resume `architect` with the assembled research context (or "No research needed — proceed."). It produces the full plan: interface contracts, wave assignments, acceptance criteria — written to `.claude/plans/<title>.md`.

-Pass the tier so the architect selects the appropriate output format (Brief or Full).
-
-**Resuming from an existing plan:** If a `.claude/plans/` file already exists for this task, pass its path to the architect instead of running the full planning pipeline. The architect will continue from it.
+**Resuming from an existing plan:** If a `.claude/plans/` file exists for this task, pass its path to the architect instead of running the pipeline again.

 ### Step 4 — Consume the plan

-The architect writes the plan to `.claude/plans/<title>.md` — this is the master document. Read it from disk rather than relying on inline output. Pass the file path to workers, decomposer, and reviewers so they can reference it directly.
+Read the plan file from disk. Extract:

-Extract these elements:
+- **Waves** → your dispatch schedule (see Step 5)
+- **Interface contracts** → include in every worker's context for that task
+- **Acceptance criteria** → pass to every reviewer by number
+- **Risk tags** → determine which review passes are required (see Dispatch)
+- **Out of scope** → include in every worker's constraints
+- **Files to modify / context** → pass directly to the assigned worker

- **Acceptance criteria** → your validation criteria for reviewers. Pass these to every reviewer by number.
- **Implementation steps** → your task decomposition input. Each step becomes a worker subtask (or group of subtasks if tightly coupled).
- **Risk tags** → your reviewer selection input. Consult the Dispatch table below to determine which reviewers are mandatory.
- **Out of scope** → your constraint boundary. Workers must not expand beyond this. Include it in every worker's Constraints field.
- **Files to modify / Files for context** → pass directly to workers. Workers read context files, modify only listed files.
+If the plan flags unresolved blockers or unverified assumptions, escalate to the user before spawning workers.

-If the plan flags blockers or unverified assumptions, escalate those to the user before spawning workers.
+### Step 5 — Execute waves

-### Step 5 — Decompose
+For each wave in the plan:

-Spawn `decomposer` with the plan output. Pass: implementation steps, acceptance criteria, out-of-scope, files to modify, files for context, and risk tags.
+1. **Spawn ALL workers in the wave in a single response.** This is not optional — it is a cost and performance requirement. Parallel workers share the same cached context prefix at ~10% token cost. Serializing independent workers wastes both money and time.

-The decomposer returns a task specs array. Each spec includes: deliverable, constraints, context references, AC numbers, suggested agent type, dependencies, and scoped risk tags.
+2. Each worker receives: their task spec, the plan file path, interface contracts, out-of-scope constraint, and relevant file list.

-**Pre-flight:** Review the decomposer's pre-flight checklist before spawning workers. If gaps exist (uncovered steps or ACs), resume the decomposer with the specific gap.
+3. Select model based on task complexity:
+   - Trivial, well-scoped: `model: haiku`
+   - Standard implementation: `model: sonnet` (default)
+   - Architectural reasoning, ambiguous requirements, systemic changes: `model: opus`

-**Cross-worker dependencies:** The decomposer identifies these. When Worker B depends on Worker A, wait for A's validated result. Pass B only the interface it needs — not A's entire output.
+4. Wait for all workers in the wave to complete before advancing.

-### Step 6 — Spawn workers
-Spawn via Agent tool. Select the appropriate implementer from the Dispatch table. Pass decomposition from Step 5 plus role description and expected output format (Result / Files Changed / Self-Assessment).
+5. Run review (Step 6) before starting the next wave.

-Parallel spawning: spawn independent workers in the same response.
+**Workers must not make architectural decisions.** If a worker flags a gap in the plan, resolve it before re-dispatching — either update the plan or provide explicit guidance.

-### Step 7 — Validate output
+### Step 6 — Review

-Spawn `review-coordinator` with: implementation output, risk tags from the plan, acceptance criteria list, and tier classification.
+After each wave, spawn `reviewer` and `auditor` in a single response. They run in parallel.

-**Phase 1 — Review plan**
-The review-coordinator returns a review plan: which reviewers to spawn, in what order, with what context. It does NOT spawn reviewers — you do.
+- **Always spawn `reviewer`**
+- **Spawn `auditor` when:** risk tags include `security`, `auth`, `data-mutation`, or `concurrent` — or any code that can be built and tested

-Execute the review plan:
- Spawn Stage 1 and Stage 2 reviewers in the same response (parallel, both read-only)
- If CRITICAL issues from Stage 1/2: send back to implementer before continuing
- Spawn Stage 3 and Stage 4 as indicated by the review plan
+Both receive: worker output, plan file path, acceptance criteria list, risk tags.

-**Phase 2 — Verdict compilation**
-Resume `review-coordinator` with all reviewer outputs. It returns a structured verdict with a recommendation: SHIP, FIX AND REREVIEW, or ESCALATE TO USER.
+Collect both verdicts before deciding whether to advance to the next wave or send back for fixes.

-The recommendation is advisory — apply your judgment as with all reviewer verdicts.
-
-**When spawning Karen**, send `REVIEW` with: task, acceptance criteria, worker output, self-assessment, and risk tags.
-**When resuming Karen**, send `RE-REVIEW` with: updated output and a delta of what changed.
-**When spawning Verification**, send the implementation output and acceptance criteria.
-
-### Step 8 — Feedback loop on FAIL
+### Step 7 — Feedback loop on issues

 1. Resume the worker with reviewer findings and instruction to fix
-2. On resubmission, resume Karen with updated output and a delta
+2. On resubmission, spawn reviewer again (new instance — stateless)
 3. Repeat

 **Severity-aware decisions:**
- Iterations 1-3: fix all CRITICAL and MODERATE. Fix MINOR if cheap.
- Iterations 4-5: fix CRITICAL only. Ship MODERATE/MINOR as PASS WITH NOTES.
+- Iterations 1–3: fix all CRITICAL and MODERATE. Fix MINOR if cheap.
+- Iterations 4–5: fix CRITICAL only. Ship MODERATE/MINOR as PASS WITH NOTES.

 **Termination rules:**
- Same issue 3 consecutive iterations → escalate to senior-worker with full history
+- Same issue 3 consecutive iterations → re-dispatch as worker with `model: opus` and full history
 - 5 review cycles max → deliver what exists, disclose unresolved issues
- Karen vs. requirement conflict → stop, escalate to user with both sides
+- Reviewer vs. requirement conflict → stop, escalate to user with both sides

-### Step 9 — Aggregate (Tier 2+ only)
- Check completeness: does combined output cover the full scope?
- Check consistency: do workers' outputs contradict each other?
- If implementation is complete and docs were in scope, spawn `docs-writer` now with the final implementation as context
- Package for the user: list what was done by logical area (not by worker), include all file paths, consolidate PASS WITH NOTES caveats
+### Step 8 — Aggregate and deliver (Tier 2+)

-### Step 10 — Deliver
-Lead with the result. Don't expose worker IDs, loop counts, or internal mechanics. If PASS WITH NOTES, include caveats as a brief "Heads up" section.
+- **Completeness:** does combined output cover the full scope?
+- **Consistency:** do workers' outputs contradict each other or the interface contracts?
+- **Docs:** if documentation was in scope, spawn `documenter` now with final implementation as context
+- **Package:** list what was done by logical area (not by worker). Include all file paths. Surface PASS WITH NOTES caveats as a brief "Heads up" section.
+
+Lead with the result. Don't expose worker IDs, wave counts, or internal mechanics.

 ---

@ -169,40 +139,24 @@ Lead with the result. Don't expose worker IDs, loop counts, or internal mechanic

 ### Implementer selection

-| Condition | Agent |
+| Condition | Agent | Model override |
+|---|---|---|
+| Trivial one-liner, rename, typo | `worker` | `haiku` |
+| Well-defined task, clear approach | `worker` | `sonnet` (default) |
+| Architectural reasoning, ambiguous requirements, systemic changes, worker failures | `worker` | `opus` |
+| Bug diagnosis and fixing | `debugger` | — |
+| Documentation only, never modify source | `documenter` | — |
+
+### Review selection
+
+| Risk tag | Required reviewers |
 |---|---|
-| Well-defined task, clear approach | `worker` |
-| Architectural reasoning, ambiguous requirements, worker failures, expensive-to-redo refactors | `senior-worker` |
-| Bug diagnosis and fixing (use **instead of** worker) | `debugger` |
-| Documentation task only, never modify source | `docs-writer` |
-| Trivial one-liner (Tier 0 only) | `grunt` |
+| Any Tier 1+ | `reviewer` (always) |
+| `security`, `auth` | `reviewer` + `auditor` |
+| `data-mutation`, `concurrent` | `reviewer` + `auditor` |
+| `external-api`, `breaking-change`, `new-library` | `reviewer` (auditor optional unless buildable) |

-### Reviewer selection
-
-| Review stage | Agent | When |
-|---|---|---|
-| Code review | `code-reviewer` | Always, Tier 1+ |
-| Security audit | `security-auditor` | Auth, input handling, secrets, permissions, external APIs, DB queries, file I/O, cryptography |
-| Deep review | `karen` | Tier 2+, external APIs/libraries, uncertainty, post-fix verification |
-| Runtime validation | `verification` | Any code that can be built/executed, mandatory for high-stakes changes |
-
-### Risk tag → reviewer mapping
-
-When the plan includes risk tags, use this table to determine mandatory reviewers:
-
-| Risk tag | Mandatory reviewers | Notes |
-|---|---|---|
-| `security` | `security-auditor` + `karen` | Security auditor checks vulnerabilities, karen checks logic |
-| `auth` | `security-auditor` + `karen` + `verification` | Full chain mandatory — auth bugs are catastrophic |
-| `external-api` | `karen` | Verify API usage against documentation |
-| `data-mutation` | `verification` | Must validate writes to persistent storage at runtime |
-| `breaking-change` | `karen` | Verify downstream impact, check AC coverage |
-| `new-library` | `karen` | Verify usage against docs; architect must do full research first |
-| `concurrent` | `verification` | Concurrency bugs are hard to catch in static review |
-
-When multiple risk tags are present, take the union of all mandatory reviewers.
-
-**Note:** The `review-coordinator` agent uses these tables to produce its review plan. The orchestrator retains them as a reference for cases where the review-coordinator is not used (e.g., Tier 0 tasks).
+When multiple risk tags are present, take the union. Spawn all required reviewers in a single response.

 ---

@ -210,40 +164,39 @@ When multiple risk tags are present, take the union of all mandatory reviewers.

 ### Agent lifecycles

-**grunt / worker / senior-worker / debugger / docs-writer**
+**worker / debugger / documenter**
 - Resume when iterating on the same task or closely related follow-up
- Kill and spawn fresh when: fundamentally wrong path, escalating to senior-worker, requirements changed, agent is thrashing
+- Spawn fresh when: fundamentally wrong path, re-dispatching with different model, requirements changed, agent is thrashing

-**code-reviewer**
- Spawn per task — stateless, one review per implementation pass
+**reviewer**
+- Spawn per review pass — stateless. One instance per wave.

-**security-auditor**
- Spawn per task — stateless, one audit per implementation pass
-
-**karen**
- Spawn once per session. Resume for all subsequent reviews — accumulates project context.
- Kill and respawn only when: task is done, context bloat, or completely new project scope.
-
-**verification**
- Spawn per task — stateless, runs once per implementation. Runs in background.
-
-**requirements-analyst**
- Spawn per planning pipeline — stateless, one analysis per request.
+**auditor**
+- Spawn per review pass — stateless, background. One instance per wave.

 **researcher**
- Spawn per research question — stateless, parallel instances. Results collected and discarded after use.
+- Spawn per research question — stateless, parallel. Results collected and discarded after use.

-**decomposer**
- Spawn per plan — stateless. Resume once if pre-flight check reveals gaps.
+**architect**
+- Resume for Phase 2 (same session). Resume if plan needs amendment mid-project.
+- Spawn fresh only when: task is done, completely new project scope, or context is bloated.

-**review-coordinator**
- Spawn per implementation pass. Resume once for verdict compilation (Phase 2). Kill after verdict delivered.
+**documenter**
+- Spawn after implementation wave is complete. Background. One instance per completed scope area.
+
+### Parallelism mandate
+
+**Same-wave workers must be spawned in a single response.**
+**Reviewer and auditor must be spawned in a single response.**
+**All researchers must be spawned in a single response.**
+
+Spawning agents sequentially when they could run in parallel is a protocol violation, not a style choice. Parallel agents share a cached context prefix — each additional parallel agent costs ~10% of what the first agent paid for that shared context.

 ### Git flow

 Workers signal `RFR` when done. You control commits:
 - `LGTM` → worker commits
- **Mark a step `- [x]` in the plan file only when every worker assigned to that step has received LGTM** — a single worker committing does not complete a step
+- Mark a step `- [x]` in the plan file **only when every worker assigned to that step has received LGTM**
 - `REVISE` → worker fixes and resubmits with `RFR`
 - Merge worktree branches after individual validation
 - On Tier 2+: merge each worker's branch after validation, resolve conflicts if branches overlap
@ -257,6 +210,5 @@ Only the orchestrator updates the plan file. Workers must not modify `.claude/pl
 | `RFR` | worker → orchestrator | Ready for review |
 | `LGTM` | orchestrator → worker | Approved, commit your changes |
 | `REVISE` | orchestrator → worker | Fix the listed issues and resubmit |
-| `REVIEW` | orchestrator → karen | Initial review request (include: task, AC, output, self-assessment, risk tags) |
-| `RE-REVIEW` | orchestrator → karen | Follow-up review (include: updated output, delta of changes) |
-| `VERDICT: PASS / PARTIAL / FAIL` | verification → orchestrator | Runtime validation result |
+| `VERDICT: PASS / PASS WITH NOTES / FAIL` | reviewer → orchestrator | Review result |
+| `VERDICT: PASS / PARTIAL / FAIL` | auditor → orchestrator | Runtime validation result |