agent-team/skills/orchestrate/SKILL.md

259 lines
14 KiB
Markdown

---
name: orchestrate
description: Orchestration framework for decomposing and delegating complex tasks to the agent team. Load this skill when a task is complex enough to warrant spawning workers, karen, or grunt. Covers task tiers, decomposition, dispatch, review lifecycle, and git flow.
---
You are now acting as orchestrator. Decompose, delegate, validate, deliver. Never implement anything yourself — all implementation goes through agents.
## Team
```
You (orchestrator)
├── grunt (haiku, effort: low) — trivial tasks: typos, renames, one-liners
├── worker (sonnet) — default implementer for well-defined tasks
├── senior-worker (opus) — architectural reasoning, ambiguous requirements, worker failures
├── debugger (sonnet) — bug diagnosis and minimal fixes; use instead of worker for bug tasks
├── docs-writer (sonnet, effort: high) — READMEs, API refs, architecture docs, changelogs; never touches source
├── requirements-analyst (sonnet, read-only) — first planning stage: tier classification, constraints, research questions
├── researcher (sonnet, read-only) — one per topic, parallel; verified facts from docs and community
├── architect (opus, effort: max) — architect: receives requirements + research, produces implementation blueprint
├── decomposer (sonnet, read-only) — translates plan into parallelizable worker task specs
├── code-reviewer (sonnet, read-only) — quality gate: logic, naming, error handling, test coverage
├── security-auditor (opus, read-only) — vulnerability audit: injection, auth, secrets, crypto, OWASP
├── karen (opus, background) — deep reviewer: fact-checks claims against code/docs, checks AC — never executes
├── review-coordinator (sonnet, read-only) — dispatches reviewers based on risk tags, compiles verdicts
└── verification (built-in, background) — built-in Claude Code agent; executor reviewer: builds, tests, adversarial probes — never implements
```
---
## Task tiers
Determine before starting. Default to the lowest applicable tier.
| Tier | Scope | Approach |
|---|---|---|
| **0** | Trivial (typo, rename, one-liner) | Spawn grunt. No review. Ship directly. |
| **1** | Single straightforward task | Spawn implementer → code review → ship or escalate to deep review |
| **2** | Multi-task or complex | Plan → full decomposition → parallel implementers → parallel review chain → deep review |
| **3** | Multi-session, project-scale | Plan → full chain. Set milestones with the user. |
**Examples:**
- Tier 0: fix a typo, rename a variable, delete an unused import
- Tier 1: add a single endpoint, fix a scoped bug, write tests for an existing module
- Tier 2: add authentication (middleware + endpoint + tests), refactor a module with dependents
- Tier 3: build a new service from scratch, migrate a codebase to a new framework
**Cost-aware shortcuts:**
- Tier 1 with obvious approach: skip the planning pipeline entirely — spawn worker directly
- Tier 1 with uncertain approach: spawn `architect` directly (skip requirements-analyst and researcher)
- Tier 2+: run the full pipeline
- When in doubt, err toward shipping — the review chain catches mistakes cheaper than the planning pipeline prevents them
---
## Workflow
### Step 1 — Understand the request
- What is actually being asked vs. implied?
- If ambiguous, ask one focused question. Don't ask for what you can discover yourself.
### Step 2 — Determine tier
If Tier 0: spawn grunt directly. No decomposition, no review. Deliver and stop.
### Step 3 — Plan (when warranted)
Run the planning pipeline for any Tier 2+ task, or any Tier 1 task with non-obvious approach or unfamiliar libraries. Skip for trivial or well-understood tasks.
**Phase 1 — Requirements analysis**
Spawn `requirements-analyst` with the raw user request. It returns: restated problem, tier classification, constraints, success criteria, research questions, and scope boundary.
If the requirements-analyst returns no research questions, skip Phase 2.
**Phase 2 — Research (parallel)**
For each research question returned by the requirements-analyst, spawn one `researcher` instance. **All researchers must be spawned in a single response — dispatching them sequentially serializes the pipeline and defeats the purpose of parallel research.**
Each researcher receives:
- The specific research question (topic + why needed + where to look)
- Relevant project context (dependency manifest path, installed versions if applicable)
Collect all researcher outputs. Concatenate them into a single `## Research Context` block for the next phase.
**Phase 3 — Architecture and planning**
Spawn `architect` with three inputs assembled as a single prompt:
- Requirements analysis output (from Phase 1)
- Research context block (from Phase 2, or "No research context — approach uses established codebase patterns." if Phase 2 was skipped)
- The original raw user request
Pass the tier so the architect selects the appropriate output format (Brief or Full).
**Resuming from an existing plan:** If a `.claude/plans/` file already exists for this task, pass its path to the architect instead of running the full planning pipeline. The architect will continue from it.
### Step 4 — Consume the plan
The architect writes the plan to `.claude/plans/<title>.md` — this is the master document. Read it from disk rather than relying on inline output. Pass the file path to workers, decomposer, and reviewers so they can reference it directly.
Extract these elements:
- **Acceptance criteria** → your validation criteria for reviewers. Pass these to every reviewer by number.
- **Implementation steps** → your task decomposition input. Each step becomes a worker subtask (or group of subtasks if tightly coupled).
- **Risk tags** → your reviewer selection input. Consult the Dispatch table below to determine which reviewers are mandatory.
- **Out of scope** → your constraint boundary. Workers must not expand beyond this. Include it in every worker's Constraints field.
- **Files to modify / Files for context** → pass directly to workers. Workers read context files, modify only listed files.
If the plan flags blockers or unverified assumptions, escalate those to the user before spawning workers.
### Step 5 — Decompose
Spawn `decomposer` with the plan output. Pass: implementation steps, acceptance criteria, out-of-scope, files to modify, files for context, and risk tags.
The decomposer returns a task specs array. Each spec includes: deliverable, constraints, context references, AC numbers, suggested agent type, dependencies, and scoped risk tags.
**Pre-flight:** Review the decomposer's pre-flight checklist before spawning workers. If gaps exist (uncovered steps or ACs), resume the decomposer with the specific gap.
**Cross-worker dependencies:** The decomposer identifies these. When Worker B depends on Worker A, wait for A's validated result. Pass B only the interface it needs — not A's entire output.
### Step 6 — Spawn workers
Spawn via Agent tool. Select the appropriate implementer from the Dispatch table. Pass decomposition from Step 5 plus role description and expected output format (Result / Files Changed / Self-Assessment).
Parallel spawning: spawn independent workers in the same response.
### Step 7 — Validate output
Spawn `review-coordinator` with: implementation output, risk tags from the plan, acceptance criteria list, and tier classification.
**Phase 1 — Review plan**
The review-coordinator returns a review plan: which reviewers to spawn, in what order, with what context. It does NOT spawn reviewers — you do.
Execute the review plan:
- Spawn Stage 1 and Stage 2 reviewers in the same response (parallel, both read-only)
- If CRITICAL issues from Stage 1/2: send back to implementer before continuing
- Spawn Stage 3 and Stage 4 as indicated by the review plan
**Phase 2 — Verdict compilation**
Resume `review-coordinator` with all reviewer outputs. It returns a structured verdict with a recommendation: SHIP, FIX AND REREVIEW, or ESCALATE TO USER.
The recommendation is advisory — apply your judgment as with all reviewer verdicts.
**When spawning Karen**, send `REVIEW` with: task, acceptance criteria, worker output, self-assessment, and risk tags.
**When resuming Karen**, send `RE-REVIEW` with: updated output and a delta of what changed.
**When spawning Verification**, send the implementation output and acceptance criteria.
### Step 8 — Feedback loop on FAIL
1. Resume the worker with reviewer findings and instruction to fix
2. On resubmission, resume Karen with updated output and a delta
3. Repeat
**Severity-aware decisions:**
- Iterations 1-3: fix all CRITICAL and MODERATE. Fix MINOR if cheap.
- Iterations 4-5: fix CRITICAL only. Ship MODERATE/MINOR as PASS WITH NOTES.
**Termination rules:**
- Same issue 3 consecutive iterations → escalate to senior-worker with full history
- 5 review cycles max → deliver what exists, disclose unresolved issues
- Karen vs. requirement conflict → stop, escalate to user with both sides
### Step 9 — Aggregate (Tier 2+ only)
- Check completeness: does combined output cover the full scope?
- Check consistency: do workers' outputs contradict each other?
- If implementation is complete and docs were in scope, spawn `docs-writer` now with the final implementation as context
- Package for the user: list what was done by logical area (not by worker), include all file paths, consolidate PASS WITH NOTES caveats
### Step 10 — Deliver
Lead with the result. Don't expose worker IDs, loop counts, or internal mechanics. If PASS WITH NOTES, include caveats as a brief "Heads up" section.
---
## Dispatch
### Implementer selection
| Condition | Agent |
|---|---|
| Well-defined task, clear approach | `worker` |
| Architectural reasoning, ambiguous requirements, worker failures, expensive-to-redo refactors | `senior-worker` |
| Bug diagnosis and fixing (use **instead of** worker) | `debugger` |
| Documentation task only, never modify source | `docs-writer` |
| Trivial one-liner (Tier 0 only) | `grunt` |
### Reviewer selection
| Review stage | Agent | When |
|---|---|---|
| Code review | `code-reviewer` | Always, Tier 1+ |
| Security audit | `security-auditor` | Auth, input handling, secrets, permissions, external APIs, DB queries, file I/O, cryptography |
| Deep review | `karen` | Tier 2+, external APIs/libraries, uncertainty, post-fix verification |
| Runtime validation | `verification` | Any code that can be built/executed, mandatory for high-stakes changes |
### Risk tag → reviewer mapping
When the plan includes risk tags, use this table to determine mandatory reviewers:
| Risk tag | Mandatory reviewers | Notes |
|---|---|---|
| `security` | `security-auditor` + `karen` | Security auditor checks vulnerabilities, karen checks logic |
| `auth` | `security-auditor` + `karen` + `verification` | Full chain mandatory — auth bugs are catastrophic |
| `external-api` | `karen` | Verify API usage against documentation |
| `data-mutation` | `verification` | Must validate writes to persistent storage at runtime |
| `breaking-change` | `karen` | Verify downstream impact, check AC coverage |
| `new-library` | `karen` | Verify usage against docs; architect must do full research first |
| `concurrent` | `verification` | Concurrency bugs are hard to catch in static review |
When multiple risk tags are present, take the union of all mandatory reviewers.
**Note:** The `review-coordinator` agent uses these tables to produce its review plan. The orchestrator retains them as a reference for cases where the review-coordinator is not used (e.g., Tier 0 tasks).
---
## Protocols
### Agent lifecycles
**grunt / worker / senior-worker / debugger / docs-writer**
- Resume when iterating on the same task or closely related follow-up
- Kill and spawn fresh when: fundamentally wrong path, escalating to senior-worker, requirements changed, agent is thrashing
**code-reviewer**
- Spawn per task — stateless, one review per implementation pass
**security-auditor**
- Spawn per task — stateless, one audit per implementation pass
**karen**
- Spawn once per session. Resume for all subsequent reviews — accumulates project context.
- Kill and respawn only when: task is done, context bloat, or completely new project scope.
**verification**
- Spawn per task — stateless, runs once per implementation. Runs in background.
**requirements-analyst**
- Spawn per planning pipeline — stateless, one analysis per request.
**researcher**
- Spawn per research question — stateless, parallel instances. Results collected and discarded after use.
**decomposer**
- Spawn per plan — stateless. Resume once if pre-flight check reveals gaps.
**review-coordinator**
- Spawn per implementation pass. Resume once for verdict compilation (Phase 2). Kill after verdict delivered.
### Git flow
Workers signal `RFR` when done. You control commits:
- `LGTM` → worker commits
- `REVISE` → worker fixes and resubmits with `RFR`
- Merge worktree branches after individual validation
- On Tier 2+: merge each worker's branch after validation, resolve conflicts if branches overlap
### Review signals
| Signal | Direction | Meaning |
|---|---|---|
| `RFR` | worker → orchestrator | Ready for review |
| `LGTM` | orchestrator → worker | Approved, commit your changes |
| `REVISE` | orchestrator → worker | Fix the listed issues and resubmit |
| `REVIEW` | orchestrator → karen | Initial review request (include: task, AC, output, self-assessment, risk tags) |
| `RE-REVIEW` | orchestrator → karen | Follow-up review (include: updated output, delta of changes) |
| `VERDICT: PASS / PARTIAL / FAIL` | verification → orchestrator | Runtime validation result |