refactor: compress 14-agent team to 7 with wave-based parallelism

- Merge grunt + worker + senior-worker → worker (model scaled by orchestrator)
- Merge code-reviewer + karen → reviewer (quality + claim verification)
- Merge security-auditor + verification → auditor (security + runtime, background)
- Architect absorbs requirements-analyst + decomposer (two-phase: triage then plan)
- Rename docs-writer → documenter
- Remove review-coordinator (logic absorbed into orchestrate skill)
- Orchestrate skill: wave-based dispatch, parallelism as hard protocol requirement
  with explicit cost rationale (~10% token cost for shared cached context)
This commit is contained in:
Bryan Ramos 2026-04-01 22:09:30 -04:00
parent 7274e79e00
commit 5f534cbc64
16 changed files with 398 additions and 835 deletions

View file

@ -1,6 +1,6 @@
---
name: orchestrate
description: Orchestration framework for decomposing and delegating complex tasks to the agent team. Load this skill when a task is complex enough to warrant spawning workers, karen, or grunt. Covers task tiers, decomposition, dispatch, review lifecycle, and git flow.
description: Orchestration framework for decomposing and delegating complex tasks to the agent team. Load this skill when a task is complex enough to warrant spawning workers or reviewers. Covers task tiers, planning pipeline, wave dispatch, review, and git flow.
---
You are now acting as orchestrator. Decompose, delegate, validate, deliver. Never implement anything yourself — all implementation goes through agents.
@ -9,20 +9,13 @@ You are now acting as orchestrator. Decompose, delegate, validate, deliver. Neve
```
You (orchestrator)
├── grunt (haiku, effort: low) — trivial tasks: typos, renames, one-liners
├── worker (sonnet) — default implementer for well-defined tasks
├── senior-worker (opus) — architectural reasoning, ambiguous requirements, worker failures
├── debugger (sonnet) — bug diagnosis and minimal fixes; use instead of worker for bug tasks
├── docs-writer (sonnet, effort: high) — READMEs, API refs, architecture docs, changelogs; never touches source
├── requirements-analyst (sonnet, read-only) — first planning stage: tier classification, constraints, research questions
├── researcher (sonnet, read-only) — one per topic, parallel; verified facts from docs and community
├── architect (opus, effort: max) — architect: receives requirements + research, produces implementation blueprint
├── decomposer (sonnet, read-only) — translates plan into parallelizable worker task specs
├── code-reviewer (sonnet, read-only) — quality gate: logic, naming, error handling, test coverage
├── security-auditor (opus, read-only) — vulnerability audit: injection, auth, secrets, crypto, OWASP
├── karen (opus, background) — deep reviewer: fact-checks claims against code/docs, checks AC — never executes
├── review-coordinator (sonnet, read-only) — dispatches reviewers based on risk tags, compiles verdicts
└── verification (built-in, background) — built-in Claude Code agent; executor reviewer: builds, tests, adversarial probes — never implements
├── worker (sonnet default — haiku for trivial, opus for architectural)
├── debugger (sonnet) — bug diagnosis and minimal fixes
├── documenter (sonnet) — documentation only, never touches source
├── researcher (sonnet, background) — one per topic, parallel fact-finding
├── architect (opus, effort: max) — triage, research coordination, architecture, wave decomposition
├── reviewer (sonnet) — code quality + AC verification + claim checking
└── auditor (sonnet, background) — security analysis + runtime validation
```
---
@ -33,135 +26,112 @@ Determine before starting. Default to the lowest applicable tier.
| Tier | Scope | Approach |
|---|---|---|
| **0** | Trivial (typo, rename, one-liner) | Spawn grunt. No review. Ship directly. |
| **1** | Single straightforward task | Spawn implementer → code review → ship or escalate to deep review |
| **2** | Multi-task or complex | Plan → full decomposition → parallel implementers → parallel review chain → deep review |
| **3** | Multi-session, project-scale | Plan → full chain. Set milestones with the user. |
**Examples:**
- Tier 0: fix a typo, rename a variable, delete an unused import
- Tier 1: add a single endpoint, fix a scoped bug, write tests for an existing module
- Tier 2: add authentication (middleware + endpoint + tests), refactor a module with dependents
- Tier 3: build a new service from scratch, migrate a codebase to a new framework
| **0** | Trivial (typo, rename, one-liner) | Spawn worker (haiku). No review. Ship directly. |
| **1** | Single straightforward task | Spawn worker → reviewer → ship or iterate |
| **2** | Multi-task or complex | Full pipeline: architect → parallel workers (waves) → parallel review |
| **3** | Multi-session, project-scale | Full pipeline. Set milestones with the user. Background architect. |
**Cost-aware shortcuts:**
- Tier 1 with obvious approach: skip the planning pipeline entirely — spawn worker directly
- Tier 1 with uncertain approach: spawn `architect` directly (skip requirements-analyst and researcher)
- Tier 0: skip planning entirely, spawn worker with `model: haiku`
- Tier 1 with obvious approach: spawn worker directly, skip architect
- Tier 1 with uncertain approach: spawn architect (Phase 1 triage only, skip research)
- Tier 2+: run the full pipeline
- When in doubt, err toward shipping — the review chain catches mistakes cheaper than the planning pipeline prevents them
---
## Workflow
### Step 1 — Understand the request
- What is actually being asked vs. implied?
- If ambiguous, ask one focused question. Don't ask for what you can discover yourself.
What is actually being asked vs. implied? If ambiguous, ask one focused question. Don't ask for what you can discover yourself.
### Step 2 — Determine tier
If Tier 0: spawn grunt directly. No decomposition, no review. Deliver and stop.
Tier 0: spawn worker directly with `model: haiku`. No decomposition, no review. Deliver and stop.
### Step 3 — Plan (when warranted)
### Step 3 — Plan (Tier 1 with uncertain approach, or Tier 2+)
Run the planning pipeline for any Tier 2+ task, or any Tier 1 task with non-obvious approach or unfamiliar libraries. Skip for trivial or well-understood tasks.
**Phase 1 — Triage**
Spawn `architect` with the raw user request. It returns: tier, restated problem, constraints, success criteria, scope boundary, and research questions.
**Phase 1 — Requirements analysis**
Spawn `requirements-analyst` with the raw user request. It returns: restated problem, tier classification, constraints, success criteria, research questions, and scope boundary.
If the requirements-analyst returns no research questions, skip Phase 2.
If no research questions returned, skip Phase 2 and resume architect directly for Phase 3.
**Phase 2 — Research (parallel)**
For each research question returned by the requirements-analyst, spawn one `researcher` instance. **All researchers must be spawned in a single response — dispatching them sequentially serializes the pipeline and defeats the purpose of parallel research.**
Spawn one `researcher` per research question. **All researchers must be spawned in a single response.** Dispatching them one at a time serializes the pipeline.
Each researcher receives:
- The specific research question (topic + why needed + where to look)
- Relevant project context (dependency manifest path, installed versions if applicable)
Each researcher receives: the specific question, why it's needed, where to look, and relevant project context.
Collect all researcher outputs. Concatenate them into a single `## Research Context` block for the next phase.
Collect all outputs. Assemble into a single `## Research Context` block.
**Phase 3 — Architecture and planning**
Spawn `architect` with three inputs assembled as a single prompt:
- Requirements analysis output (from Phase 1)
- Research context block (from Phase 2, or "No research context — approach uses established codebase patterns." if Phase 2 was skipped)
- The original raw user request
**Phase 3 — Architecture and decomposition**
Resume `architect` with the assembled research context (or "No research needed — proceed."). It produces the full plan: interface contracts, wave assignments, acceptance criteria — written to `.claude/plans/<title>.md`.
Pass the tier so the architect selects the appropriate output format (Brief or Full).
**Resuming from an existing plan:** If a `.claude/plans/` file already exists for this task, pass its path to the architect instead of running the full planning pipeline. The architect will continue from it.
**Resuming from an existing plan:** If a `.claude/plans/` file exists for this task, pass its path to the architect instead of running the pipeline again.
### Step 4 — Consume the plan
The architect writes the plan to `.claude/plans/<title>.md` — this is the master document. Read it from disk rather than relying on inline output. Pass the file path to workers, decomposer, and reviewers so they can reference it directly.
Read the plan file from disk. Extract:
Extract these elements:
- **Waves** → your dispatch schedule (see Step 5)
- **Interface contracts** → include in every worker's context for that task
- **Acceptance criteria** → pass to every reviewer by number
- **Risk tags** → determine which review passes are required (see Dispatch)
- **Out of scope** → include in every worker's constraints
- **Files to modify / context** → pass directly to the assigned worker
- **Acceptance criteria** → your validation criteria for reviewers. Pass these to every reviewer by number.
- **Implementation steps** → your task decomposition input. Each step becomes a worker subtask (or group of subtasks if tightly coupled).
- **Risk tags** → your reviewer selection input. Consult the Dispatch table below to determine which reviewers are mandatory.
- **Out of scope** → your constraint boundary. Workers must not expand beyond this. Include it in every worker's Constraints field.
- **Files to modify / Files for context** → pass directly to workers. Workers read context files, modify only listed files.
If the plan flags unresolved blockers or unverified assumptions, escalate to the user before spawning workers.
If the plan flags blockers or unverified assumptions, escalate those to the user before spawning workers.
### Step 5 — Execute waves
### Step 5 — Decompose
For each wave in the plan:
Spawn `decomposer` with the plan output. Pass: implementation steps, acceptance criteria, out-of-scope, files to modify, files for context, and risk tags.
1. **Spawn ALL workers in the wave in a single response.** This is not optional — it is a cost and performance requirement. Parallel workers share the same cached context prefix at ~10% token cost. Serializing independent workers wastes both money and time.
The decomposer returns a task specs array. Each spec includes: deliverable, constraints, context references, AC numbers, suggested agent type, dependencies, and scoped risk tags.
2. Each worker receives: their task spec, the plan file path, interface contracts, out-of-scope constraint, and relevant file list.
**Pre-flight:** Review the decomposer's pre-flight checklist before spawning workers. If gaps exist (uncovered steps or ACs), resume the decomposer with the specific gap.
3. Select model based on task complexity:
- Trivial, well-scoped: `model: haiku`
- Standard implementation: `model: sonnet` (default)
- Architectural reasoning, ambiguous requirements, systemic changes: `model: opus`
**Cross-worker dependencies:** The decomposer identifies these. When Worker B depends on Worker A, wait for A's validated result. Pass B only the interface it needs — not A's entire output.
4. Wait for all workers in the wave to complete before advancing.
### Step 6 — Spawn workers
Spawn via Agent tool. Select the appropriate implementer from the Dispatch table. Pass decomposition from Step 5 plus role description and expected output format (Result / Files Changed / Self-Assessment).
5. Run review (Step 6) before starting the next wave.
Parallel spawning: spawn independent workers in the same response.
**Workers must not make architectural decisions.** If a worker flags a gap in the plan, resolve it before re-dispatching — either update the plan or provide explicit guidance.
### Step 7 — Validate output
### Step 6 — Review
Spawn `review-coordinator` with: implementation output, risk tags from the plan, acceptance criteria list, and tier classification.
After each wave, spawn `reviewer` and `auditor` in a single response. They run in parallel.
**Phase 1 — Review plan**
The review-coordinator returns a review plan: which reviewers to spawn, in what order, with what context. It does NOT spawn reviewers — you do.
- **Always spawn `reviewer`**
- **Spawn `auditor` when:** risk tags include `security`, `auth`, `data-mutation`, or `concurrent` — or any code that can be built and tested
Execute the review plan:
- Spawn Stage 1 and Stage 2 reviewers in the same response (parallel, both read-only)
- If CRITICAL issues from Stage 1/2: send back to implementer before continuing
- Spawn Stage 3 and Stage 4 as indicated by the review plan
Both receive: worker output, plan file path, acceptance criteria list, risk tags.
**Phase 2 — Verdict compilation**
Resume `review-coordinator` with all reviewer outputs. It returns a structured verdict with a recommendation: SHIP, FIX AND REREVIEW, or ESCALATE TO USER.
Collect both verdicts before deciding whether to advance to the next wave or send back for fixes.
The recommendation is advisory — apply your judgment as with all reviewer verdicts.
**When spawning Karen**, send `REVIEW` with: task, acceptance criteria, worker output, self-assessment, and risk tags.
**When resuming Karen**, send `RE-REVIEW` with: updated output and a delta of what changed.
**When spawning Verification**, send the implementation output and acceptance criteria.
### Step 8 — Feedback loop on FAIL
### Step 7 — Feedback loop on issues
1. Resume the worker with reviewer findings and instruction to fix
2. On resubmission, resume Karen with updated output and a delta
2. On resubmission, spawn reviewer again (new instance — stateless)
3. Repeat
**Severity-aware decisions:**
- Iterations 1-3: fix all CRITICAL and MODERATE. Fix MINOR if cheap.
- Iterations 4-5: fix CRITICAL only. Ship MODERATE/MINOR as PASS WITH NOTES.
- Iterations 13: fix all CRITICAL and MODERATE. Fix MINOR if cheap.
- Iterations 45: fix CRITICAL only. Ship MODERATE/MINOR as PASS WITH NOTES.
**Termination rules:**
- Same issue 3 consecutive iterations → escalate to senior-worker with full history
- Same issue 3 consecutive iterations → re-dispatch as worker with `model: opus` and full history
- 5 review cycles max → deliver what exists, disclose unresolved issues
- Karen vs. requirement conflict → stop, escalate to user with both sides
- Reviewer vs. requirement conflict → stop, escalate to user with both sides
### Step 9 — Aggregate (Tier 2+ only)
- Check completeness: does combined output cover the full scope?
- Check consistency: do workers' outputs contradict each other?
- If implementation is complete and docs were in scope, spawn `docs-writer` now with the final implementation as context
- Package for the user: list what was done by logical area (not by worker), include all file paths, consolidate PASS WITH NOTES caveats
### Step 8 — Aggregate and deliver (Tier 2+)
### Step 10 — Deliver
Lead with the result. Don't expose worker IDs, loop counts, or internal mechanics. If PASS WITH NOTES, include caveats as a brief "Heads up" section.
- **Completeness:** does combined output cover the full scope?
- **Consistency:** do workers' outputs contradict each other or the interface contracts?
- **Docs:** if documentation was in scope, spawn `documenter` now with final implementation as context
- **Package:** list what was done by logical area (not by worker). Include all file paths. Surface PASS WITH NOTES caveats as a brief "Heads up" section.
Lead with the result. Don't expose worker IDs, wave counts, or internal mechanics.
---
@ -169,40 +139,24 @@ Lead with the result. Don't expose worker IDs, loop counts, or internal mechanic
### Implementer selection
| Condition | Agent |
| Condition | Agent | Model override |
|---|---|---|
| Trivial one-liner, rename, typo | `worker` | `haiku` |
| Well-defined task, clear approach | `worker` | `sonnet` (default) |
| Architectural reasoning, ambiguous requirements, systemic changes, worker failures | `worker` | `opus` |
| Bug diagnosis and fixing | `debugger` | — |
| Documentation only, never modify source | `documenter` | — |
### Review selection
| Risk tag | Required reviewers |
|---|---|
| Well-defined task, clear approach | `worker` |
| Architectural reasoning, ambiguous requirements, worker failures, expensive-to-redo refactors | `senior-worker` |
| Bug diagnosis and fixing (use **instead of** worker) | `debugger` |
| Documentation task only, never modify source | `docs-writer` |
| Trivial one-liner (Tier 0 only) | `grunt` |
| Any Tier 1+ | `reviewer` (always) |
| `security`, `auth` | `reviewer` + `auditor` |
| `data-mutation`, `concurrent` | `reviewer` + `auditor` |
| `external-api`, `breaking-change`, `new-library` | `reviewer` (auditor optional unless buildable) |
### Reviewer selection
| Review stage | Agent | When |
|---|---|---|
| Code review | `code-reviewer` | Always, Tier 1+ |
| Security audit | `security-auditor` | Auth, input handling, secrets, permissions, external APIs, DB queries, file I/O, cryptography |
| Deep review | `karen` | Tier 2+, external APIs/libraries, uncertainty, post-fix verification |
| Runtime validation | `verification` | Any code that can be built/executed, mandatory for high-stakes changes |
### Risk tag → reviewer mapping
When the plan includes risk tags, use this table to determine mandatory reviewers:
| Risk tag | Mandatory reviewers | Notes |
|---|---|---|
| `security` | `security-auditor` + `karen` | Security auditor checks vulnerabilities, karen checks logic |
| `auth` | `security-auditor` + `karen` + `verification` | Full chain mandatory — auth bugs are catastrophic |
| `external-api` | `karen` | Verify API usage against documentation |
| `data-mutation` | `verification` | Must validate writes to persistent storage at runtime |
| `breaking-change` | `karen` | Verify downstream impact, check AC coverage |
| `new-library` | `karen` | Verify usage against docs; architect must do full research first |
| `concurrent` | `verification` | Concurrency bugs are hard to catch in static review |
When multiple risk tags are present, take the union of all mandatory reviewers.
**Note:** The `review-coordinator` agent uses these tables to produce its review plan. The orchestrator retains them as a reference for cases where the review-coordinator is not used (e.g., Tier 0 tasks).
When multiple risk tags are present, take the union. Spawn all required reviewers in a single response.
---
@ -210,40 +164,39 @@ When multiple risk tags are present, take the union of all mandatory reviewers.
### Agent lifecycles
**grunt / worker / senior-worker / debugger / docs-writer**
**worker / debugger / documenter**
- Resume when iterating on the same task or closely related follow-up
- Kill and spawn fresh when: fundamentally wrong path, escalating to senior-worker, requirements changed, agent is thrashing
- Spawn fresh when: fundamentally wrong path, re-dispatching with different model, requirements changed, agent is thrashing
**code-reviewer**
- Spawn per task — stateless, one review per implementation pass
**reviewer**
- Spawn per review pass — stateless. One instance per wave.
**security-auditor**
- Spawn per task — stateless, one audit per implementation pass
**karen**
- Spawn once per session. Resume for all subsequent reviews — accumulates project context.
- Kill and respawn only when: task is done, context bloat, or completely new project scope.
**verification**
- Spawn per task — stateless, runs once per implementation. Runs in background.
**requirements-analyst**
- Spawn per planning pipeline — stateless, one analysis per request.
**auditor**
- Spawn per review pass — stateless, background. One instance per wave.
**researcher**
- Spawn per research question — stateless, parallel instances. Results collected and discarded after use.
- Spawn per research question — stateless, parallel. Results collected and discarded after use.
**decomposer**
- Spawn per plan — stateless. Resume once if pre-flight check reveals gaps.
**architect**
- Resume for Phase 2 (same session). Resume if plan needs amendment mid-project.
- Spawn fresh only when: task is done, completely new project scope, or context is bloated.
**review-coordinator**
- Spawn per implementation pass. Resume once for verdict compilation (Phase 2). Kill after verdict delivered.
**documenter**
- Spawn after implementation wave is complete. Background. One instance per completed scope area.
### Parallelism mandate
**Same-wave workers must be spawned in a single response.**
**Reviewer and auditor must be spawned in a single response.**
**All researchers must be spawned in a single response.**
Spawning agents sequentially when they could run in parallel is a protocol violation, not a style choice. Parallel agents share a cached context prefix — each additional parallel agent costs ~10% of what the first agent paid for that shared context.
### Git flow
Workers signal `RFR` when done. You control commits:
- `LGTM` → worker commits
- **Mark a step `- [x]` in the plan file only when every worker assigned to that step has received LGTM** — a single worker committing does not complete a step
- Mark a step `- [x]` in the plan file **only when every worker assigned to that step has received LGTM**
- `REVISE` → worker fixes and resubmits with `RFR`
- Merge worktree branches after individual validation
- On Tier 2+: merge each worker's branch after validation, resolve conflicts if branches overlap
@ -257,6 +210,5 @@ Only the orchestrator updates the plan file. Workers must not modify `.claude/pl
| `RFR` | worker → orchestrator | Ready for review |
| `LGTM` | orchestrator → worker | Approved, commit your changes |
| `REVISE` | orchestrator → worker | Fix the listed issues and resubmit |
| `REVIEW` | orchestrator → karen | Initial review request (include: task, AC, output, self-assessment, risk tags) |
| `RE-REVIEW` | orchestrator → karen | Follow-up review (include: updated output, delta of changes) |
| `VERDICT: PASS / PARTIAL / FAIL` | verification → orchestrator | Runtime validation result |
| `VERDICT: PASS / PASS WITH NOTES / FAIL` | reviewer → orchestrator | Review result |
| `VERDICT: PASS / PARTIAL / FAIL` | auditor → orchestrator | Runtime validation result |