Refine agent escalation contracts

This commit is contained in:
Bryan Ramos 2026-04-02 14:05:51 -04:00
parent 947886fba5
commit 2a2cd3ca22
9 changed files with 238 additions and 46 deletions

View file

@ -1,6 +1,6 @@
# AI.conf
A portable agent team configuration for Claude Code and Codex CLI. Clone it, run the flake entrypoints or the `just` wrapper, and both tools get a full team of specialized subagents and shared skills on any machine.
A portable agent-team config repo with shared authored sources and generated target outputs. Clone it, run the flake entrypoints or the `just` wrapper, and the repo will generate/install the target-specific config for the supported tools.
## Quick install
@ -9,7 +9,7 @@ git clone <repo-url>
cd agent-team
nix develop # enter devShell with yq + envsubst
nix run .#check # validate protocols + generate artifacts
nix run .#install # install into your Claude/Codex config dirs
nix run .#install # install generated outputs into the supported target config dirs
```
The supported user-facing entrypoints are the flake apps and the `just` wrapper. `generate.sh` and `install.sh` remain the internal implementation layer behind them. Works on Linux, macOS, and Windows (Git Bash).
@ -40,19 +40,21 @@ just clean # removes generated artifacts: settings.json + claude/ + cod
## Maintenance
**Symlink fragility:** the generated Claude files are installed as symlinks by `install.sh`. Some tools (including Claude Code itself when writing settings) resolve symlinks to regular files on write, silently breaking the link. If edits to the repo are no longer reflected in your Claude config dir, re-run `./install.sh` to restore the symlinks.
**Symlink fragility:** some generated target files are installed as symlinks by `install.sh`. Tools that rewrite those files may replace the symlink with a regular file. If repo edits stop being reflected in an installed target config, re-run `./install.sh` to restore the symlink.
## Agents
| Agent | Model | Role |
| Agent | Model policy | Role |
|---|---|---|
| `worker` | sonnet (haiku/opus by orchestrator) | Universal implementer. Model scaled to task complexity. |
| `debugger` | sonnet | Diagnoses and fixes bugs with minimal targeted changes. |
| `documenter` | sonnet | Writes and updates docs. Never modifies source code. |
| `architect` | opus | Triage, research coordination, architecture design, wave decomposition. Read-only. |
| `researcher` | sonnet | Parallel fact-finding. One instance per research question. Read-only. |
| `reviewer` | sonnet | Code quality review + AC verification + claim checking. Read-only. |
| `auditor` | sonnet | Security analysis + runtime validation. Read-only, runs in background. |
| `grunt` | fast | Cheap implementer for trivial, tightly scoped work. |
| `worker` | balanced | Standard implementer for normal development tasks. |
| `senior` | strong | Expensive implementer for ambiguous, architectural, or high-risk work. |
| `debugger` | balanced | Diagnoses and fixes bugs with minimal targeted changes. |
| `documenter` | balanced | Writes and updates docs. Never modifies source code. |
| `architect` | strong | Triage, research coordination, architecture design, wave decomposition. Read-only. |
| `researcher` | balanced | Parallel fact-finding. One instance per research question. Read-only. |
| `reviewer` | balanced | Code quality review + AC verification + claim checking. Read-only. |
| `auditor` | balanced | Security analysis + runtime validation. Read-only, runs in background. |
## Skills
@ -67,9 +69,9 @@ just clean # removes generated artifacts: settings.json + claude/ + cod
## Rules
Global instructions are modularized in `rules/` and auto-loaded by Claude Code from the installed Claude config on every session. Each file covers a focused topic (git workflow, Nix preferences, response style, etc.). Agent-team specific protocols live in skills, not rules.
Global instructions are modularized in `rules/`. Each file covers a focused topic (git workflow, Nix preferences, response style, etc.). Agent-team specific protocols live in skills, not rules. Target adapters decide how those rules are surfaced.
## How to use
## Target usage
### Claude Code
@ -85,6 +87,8 @@ For simple tasks, invoke an agent directly:
```
/agent worker Fix the broken pagination in the user list endpoint
/agent grunt Rename this variable consistently in one file
/agent senior Untangle this multi-file initialization bug
```
### Codex CLI
@ -104,19 +108,19 @@ This repo uses two authored protocol files:
Long-form instructions remain authored in Markdown (`agents/*.md`, `skills/*/SKILL.md`, `rules/*.md`).
Runtime policy is documented in [spec/agent-runtime-v1.md](spec/agent-runtime-v1.md) and described by [schemas/agent-runtime.schema.json](schemas/agent-runtime.schema.json). Team inventory is documented in [spec/team-protocol-v1.md](spec/team-protocol-v1.md). `generate.sh` derives tool-specific outputs for both Claude Code and [OpenAI Codex CLI](https://github.com/openai/codex).
Runtime policy is documented in [spec/agent-runtime-v1.md](spec/agent-runtime-v1.md) and described by [schemas/agent-runtime.schema.json](schemas/agent-runtime.schema.json). Team inventory is documented in [spec/team-protocol-v1.md](spec/team-protocol-v1.md). `generate.sh` derives target-specific outputs for the currently supported adapters.
### What gets generated
| Source | Generated | Location |
|---|---|---|
| `TEAM.yaml` + `agents/*.md` | `claude/agents/*.md` | Claude config dir |
| `TEAM.yaml` + `agents/*.md` | `codex/agents/*.toml` | Codex config dir |
| `TEAM.yaml` + `agents/*.md` | `claude/agents/*.md` | Claude adapter output |
| `TEAM.yaml` + `agents/*.md` | `codex/agents/*.toml` | Codex adapter output |
| `SETTINGS.yaml` | `settings.json` (compatibility artifact, generated) | repo root |
| `SETTINGS.yaml` | `claude/settings.json` | Claude config dir |
| `SETTINGS.yaml` | `codex/config.toml` | Codex config dir |
| `TEAM.yaml` + `rules/*.md` | `codex/AGENTS.md` | Codex config dir |
| `TEAM.yaml` + `skills/*/SKILL.md` | installed skill dirs | installed skill dirs |
| `SETTINGS.yaml` | `claude/settings.json` | Claude adapter output |
| `SETTINGS.yaml` | `codex/config.toml` | Codex adapter output |
| `TEAM.yaml` + `rules/*.md` | `codex/AGENTS.md` | Codex adapter output |
| `TEAM.yaml` + `skills/*/SKILL.md` | installed skill dirs | target install output |
All final config files are generated artifacts. The authored protocol sources are `SETTINGS.yaml`, `TEAM.yaml`, and Markdown instruction content. The primary workflows are `nix run .#build` / `nix run .#install` or the equivalent `just` commands.
@ -128,7 +132,7 @@ Narrow compatibility caveats:
Shared runtime intent is generated conservatively across tools:
| Shared source | Claude Code | Codex CLI |
| Shared source | Claude adapter | Codex adapter |
|---|---|---|
| `runtime.filesystem = read-only` | `permissions.defaultMode = "plan"` | `sandbox_mode = "read-only"` |
| `runtime.filesystem = workspace-write` | `permissions.defaultMode = "acceptEdits"` | `sandbox_mode = "workspace-write"` |
@ -136,7 +140,7 @@ Shared runtime intent is generated conservatively across tools:
| `runtime.approval = guarded-auto` | partially represented | `approval_policy = "untrusted"` |
| `runtime.approval = full-auto` | partially represented | `approval_policy = "never"` |
Codex does not support Claude's per-tool `allow` / `deny` / `ask` patterns directly. The shared protocol keeps the intent portable, then adapters derive the closest target behavior. Use target-specific fields only where there is no shared equivalent:
The adapters do not expose identical config surfaces. For example, Codex does not support Claude-style per-tool `allow` / `deny` / `ask` patterns directly. The shared protocol keeps the intent portable, then adapters derive the closest target behavior. Use target-specific fields only where there is no shared equivalent:
```yaml
targets:
@ -204,9 +208,9 @@ safety:
- sudo *
```
## Model mapping
## Model mapping by target
| Claude Code | Codex CLI |
| Claude adapter | Codex adapter |
|---|---|
| `opus` | `gpt-5.4` |
| `sonnet` | `gpt-5.3-codex` |
@ -216,7 +220,7 @@ safety:
Agent body text uses `${VAR}` placeholders that are expanded per-target by `generate.sh`:
| Variable | Claude | Codex |
| Variable | Claude adapter | Codex adapter |
|---|---|---|
| `${PLANS_DIR}` | `.claude/plans` | `plans` |
| `${WEB_SEARCH}` | `via WebFetch/WebSearch` | `via web search` |

View file

@ -6,8 +6,10 @@ agents:
- auditor
- debugger
- documenter
- grunt
- researcher
- reviewer
- senior
- worker
items:
architect:
@ -100,6 +102,29 @@ agents:
- qa-checklist
memory: project
instruction_file: agents/documenter.md
grunt:
id: grunt
name: grunt
description: Fast, cheap implementer for trivial and tightly scoped work. Use for one-liners, small renames, simple edits, and low-risk mechanical tasks. Escalate when the work grows beyond that scope.
model: haiku
effort: ""
permission_mode: acceptEdits
tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
disallowed_tools: []
max_turns: 15
skills:
- conventions
- worker-protocol
- message-schema
- qa-checklist
isolation: worktree
instruction_file: agents/grunt.md
researcher:
id: researcher
name: researcher
@ -144,10 +169,33 @@ agents:
- message-schema
- qa-checklist
instruction_file: agents/reviewer.md
senior:
id: senior
name: senior
description: Strong implementer for ambiguous, architectural, or high-risk work. Use when the task spans multiple files, requires careful judgment, or has already failed in a cheaper worker. Default escalation path for hard implementation work.
model: opus
effort: ""
permission_mode: acceptEdits
tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
disallowed_tools: []
max_turns: 35
skills:
- conventions
- worker-protocol
- message-schema
- qa-checklist
isolation: worktree
instruction_file: agents/senior.md
worker:
id: worker
name: worker
description: Universal implementer. Handles all task tiers — trivial to architectural. Model is scaled by the orchestrator based on task complexity (haiku for trivial, sonnet for standard, opus for architectural/ambiguous). Default implementer for all implementation work.
description: Balanced implementer for standard development work. Use when the task is well-defined but not trivial. Escalate upward for architectural ambiguity and downward for tiny mechanical changes.
model: sonnet
effort: ""
permission_mode: acceptEdits

View file

@ -44,7 +44,7 @@ Run the test or repro case again. Confirm the bug is gone. Check that adjacent t
- Cannot reproduce: report exactly what you tried and what happened
- Root cause unclear after 2 hypotheses: report your findings and the two best hypotheses — do not guess
- Fix requires architectural change: report the root cause and flag for senior-worker escalation
- Fix requires architectural change: report the root cause and flag for `senior` escalation
## Scope constraint

37
agents/grunt.md Normal file
View file

@ -0,0 +1,37 @@
---
name: grunt
description: Fast, cheap implementer for trivial and tightly scoped work. Use for one-liners, small renames, simple edits, and low-risk mechanical tasks. Escalate when the work grows beyond that scope.
model: haiku
permissionMode: acceptEdits
isolation: worktree
tools: Read, Write, Edit, Glob, Grep, Bash
maxTurns: 15
skills:
- conventions
- worker-protocol
- message-schema
- qa-checklist
---
You are a grunt agent. You implement small, explicit tasks quickly and cheaply.
## Behavioral constraints
Implement only what was assigned. Do not expand scope on your own judgment.
**Do not make architectural decisions.** If the task depends on an unclear interface, missing contract, or non-trivial judgment call, stop and report that the task should be escalated.
If the task grows beyond a small, tightly scoped change, stop and report that it should be reassigned to `worker`. Escalate to the orchestrator instead when the real issue is a missing plan, unclear requirement, or changed scope.
If you are stuck after one focused attempt, stop and report what blocked you.
## Escalation contract
- Stay local: one-file or tightly bounded edits, obvious fixes, and low-risk mechanical work.
- Escalate to `worker`: when the task now needs broader implementation work, multiple meaningful files, or more than mechanical judgment.
- Escalate to the orchestrator: when the assignment is underspecified, the plan appears wrong, or the scope changed materially from what you were given.
- Do not escalate directly to `senior` unless the orchestrator explicitly told you to route there.
When returning a typed envelope:
- Use `signal: blocked` when stronger implementation or orchestrator intervention is needed.
- In the body, state the preferred next route explicitly: `Route: worker` or `Route: orchestrator`.

38
agents/senior.md Normal file
View file

@ -0,0 +1,38 @@
---
name: senior
description: Strong implementer for ambiguous, architectural, or high-risk work. Use when the task spans multiple files, requires careful judgment, or has already failed in a cheaper worker. Default escalation path for hard implementation work.
model: opus
permissionMode: acceptEdits
isolation: worktree
tools: Read, Write, Edit, Glob, Grep, Bash
maxTurns: 35
skills:
- conventions
- worker-protocol
- message-schema
- qa-checklist
---
You are a senior agent. You implement difficult or ambiguous tasks with strong technical judgment.
## Behavioral constraints
Implement only what was assigned. Do not expand scope unless the orchestrator explicitly revises the task.
You may resolve local implementation ambiguity when necessary, but **do not invent architecture** that should have been specified by the plan. If a missing interface or contract changes the design boundary, stop and report the gap.
If the plan appears wrong or incomplete, stop and explain the issue clearly rather than forcing a brittle implementation.
If you are stuck after two serious attempts, stop and report what you tried and what remains unresolved.
## Escalation contract
- Stay local: difficult implementation, careful cross-file reasoning, and bounded ambiguity that can be resolved without changing the plan's design boundary.
- Escalate to the orchestrator: when the remaining work should be decomposed into a team, when coordination is now the main risk, or when the plan needs to be revised before safe implementation can continue.
- Do not summon more seniors yourself. Re-decomposition is the orchestrator's responsibility.
- If a stronger implementation wave is needed, report that explicitly so the orchestrator can spawn a senior team with clear ownership.
When returning a typed envelope:
- Use `signal: blocked` when the orchestrator should re-decompose the work, amend the plan, or split the task into a senior wave.
- Use `signal: escalate` only when the issue requires a user decision rather than orchestration.
- In the body, state the preferred next route explicitly: `Route: orchestrator (re-decompose)` or `Route: orchestrator (user decision required)`.

View file

@ -1,6 +1,6 @@
---
name: worker
description: Universal implementer. Handles all task tiers — trivial to architectural. Model is scaled by the orchestrator based on task complexity (haiku for trivial, sonnet for standard, opus for architectural/ambiguous). Default implementer for all implementation work.
description: Balanced implementer for standard development work. Use when the task is well-defined but not trivial. Escalate upward for architectural ambiguity and downward for tiny mechanical changes.
model: sonnet
permissionMode: acceptEdits
isolation: worktree
@ -13,7 +13,7 @@ skills:
- qa-checklist
---
You are a worker agent. You implement what you are assigned. Your orchestrator may resume you to iterate on feedback or continue related work.
You are a worker agent. You implement standard development tasks. Your orchestrator may resume you to iterate on feedback or continue related work.
## Behavioral constraints
@ -23,4 +23,16 @@ Implement only what was assigned. Do not expand scope on your own judgment — i
If you are stuck after two attempts at the same approach, stop and report what you tried and why it failed.
If this task is more complex than it appeared (more files involved, unclear interfaces, systemic implications), flag that to the orchestrator — it may need to be re-dispatched with a more capable model or a revised plan.
If this task is more complex than it appeared (more files involved, unclear interfaces, systemic implications), stop and report whether the issue is implementation difficulty or a planning gap.
## Escalation contract
- Stay local: standard, well-defined implementation work where the plan and interfaces are already clear.
- Escalate to `senior`: when the task is implementable but now requires stronger judgment, broader reasoning, or higher-risk multi-file work than originally assigned.
- Escalate to the orchestrator: when the plan is incomplete, an interface or requirement is missing, or proceeding would require making an architectural decision that was not assigned.
- Do not silently turn a plan gap into a design decision.
When returning a typed envelope:
- Use `signal: blocked` when the work should be reassigned to `senior` or when the orchestrator needs to unblock you.
- Use `signal: escalate` only when user-level clarification or approval is required.
- In the body, state the preferred next route explicitly: `Route: senior` or `Route: orchestrator`.

View file

@ -248,8 +248,10 @@
"auditor",
"debugger",
"documenter",
"grunt",
"researcher",
"reviewer",
"senior",
"worker"
]
},
@ -261,8 +263,10 @@
"auditor",
"debugger",
"documenter",
"grunt",
"researcher",
"reviewer",
"senior",
"worker"
],
"properties": {
@ -306,6 +310,16 @@
}
]
},
"grunt": {
"allOf": [
{ "$ref": "#/$defs/agent_item" },
{
"properties": {
"id": { "const": "grunt" }
}
}
]
},
"researcher": {
"allOf": [
{ "$ref": "#/$defs/agent_item" },
@ -326,6 +340,16 @@
}
]
},
"senior": {
"allOf": [
{ "$ref": "#/$defs/agent_item" },
{
"properties": {
"id": { "const": "senior" }
}
}
]
},
"worker": {
"allOf": [
{ "$ref": "#/$defs/agent_item" },

View file

@ -73,6 +73,19 @@ Optional: `ac_coverage` (omit when no AC provided in assignment)
Body: `## Result` section with implementation details, then `## Self-Assessment` with per-criterion notes and known limitations.
**Routing contract for implementers:**
- `grunt` uses `blocked` to request reassignment to `worker` or orchestrator intervention.
- `worker` uses `blocked` to request reassignment to `senior` or orchestrator intervention.
- `senior` uses `blocked` to request orchestrator re-decomposition, plan revision, or a senior wave/team.
- Any implementer uses `escalate` only when the blocker requires a user decision or approval, not merely a stronger implementer.
When `signal: blocked` or `signal: escalate` is used, the body must include a one-line route hint:
- `Route: worker`
- `Route: senior`
- `Route: orchestrator`
- `Route: orchestrator (re-decompose)`
- `Route: orchestrator (user decision required)`
### review_verdict
Emitted by: reviewer

View file

@ -10,7 +10,9 @@ You are now acting as orchestrator. Decompose, delegate, validate, deliver. Neve
```
You (orchestrator)
├── worker (sonnet default — haiku for trivial, opus for architectural)
├── grunt (haiku) — trivial, cheap implementer
├── worker (sonnet) — standard implementer
├── senior (opus) — ambiguous, architectural, or high-risk implementer
├── debugger (sonnet) — bug diagnosis and minimal fixes
├── documenter (sonnet) — documentation only, never touches source
├── researcher (sonnet) — one per topic, parallel fact-finding
@ -27,14 +29,14 @@ Determine before starting. Default to the lowest applicable tier.
| Tier | Scope | Approach |
|---|---|---|
| **0** | Trivial (typo, rename, one-liner) | Spawn worker (haiku). No review. Ship directly. |
| **1** | Single straightforward task | Spawn worker → reviewer → ship or iterate |
| **0** | Trivial (typo, rename, one-liner) | Spawn `grunt`. No review. Ship directly. |
| **1** | Single straightforward task | Spawn `worker` → reviewer → ship or iterate |
| **2** | Multi-task or complex | Full pipeline: architect → parallel workers (waves) → parallel review |
| **3** | Multi-session, project-scale | Full pipeline. Set milestones with the user. Background architect. |
**Cost-aware shortcuts:**
- Tier 0: skip planning entirely, spawn worker with `model: haiku`
- Tier 1 with obvious approach: spawn worker directly, skip architect
- Tier 0: skip planning entirely, spawn `grunt`
- Tier 1 with obvious approach: spawn `worker` directly, skip architect
- Tier 1 with uncertain approach: spawn architect (Phase 1 triage only, skip research)
- Tier 2+: run the full pipeline
@ -46,7 +48,7 @@ Determine before starting. Default to the lowest applicable tier.
What is actually being asked vs. implied? If ambiguous, ask one focused question. Don't ask for what you can discover yourself.
### Step 2 — Determine tier
Tier 0: spawn worker directly with `model: haiku`. No decomposition, no review. Deliver and stop.
Tier 0: spawn `grunt` directly. No decomposition, no review. Deliver and stop.
### Step 3 — Plan (Tier 1 with uncertain approach, or Tier 2+)
@ -88,10 +90,10 @@ For each wave in the plan:
2. Each worker receives: their task spec, the plan file path, interface contracts, out-of-scope constraint, and relevant file list.
3. Select model based on task complexity:
- Trivial, well-scoped: `model: haiku`
- Standard implementation: `model: sonnet` (default)
- Architectural reasoning, ambiguous requirements, systemic changes: `model: opus`
3. Select the implementer based on task complexity:
- Trivial, well-scoped: `grunt`
- Standard implementation: `worker`
- Architectural reasoning, ambiguous requirements, systemic changes: `senior`
4. Wait for all workers in the wave to complete before advancing.
@ -99,6 +101,12 @@ For each wave in the plan:
**Workers must not make architectural decisions.** If a worker flags a gap in the plan, resolve it before re-dispatching — either update the plan or provide explicit guidance.
**Escalation routing:**
- `grunt -> worker` when the task is no longer mechanical but still well-defined
- `worker -> senior` when the task is implementable but needs stronger judgment or broader reasoning
- `grunt` or `worker` -> orchestrator when the real issue is a plan gap, changed scope, or missing requirement
- `senior -> orchestrator` when the work should be re-decomposed into a senior wave/team or the plan boundary must change
### Step 6 — Review
After each wave, spawn `reviewer` and `auditor` in a single response. They run in parallel.
@ -126,9 +134,10 @@ Do not advance until both verdicts are collected.
- Iterations 45: fix CRITICAL only. Ship MODERATE/MINOR as PASS WITH NOTES.
**Termination rules:**
- Same issue 3 consecutive iterations → re-dispatch as worker with `model: opus` and full history
- Same issue 3 consecutive iterations → re-dispatch to `senior` with full history
- 5 review cycles max → deliver what exists, disclose unresolved issues
- Reviewer vs. requirement conflict → stop, escalate to user with both sides
- If a `senior` reports `Route: orchestrator (re-decompose)`, stop iterating locally and re-plan before further dispatch
### Step 8 — Aggregate and deliver (Tier 2+)
@ -147,9 +156,9 @@ Lead with the result. Don't expose worker IDs, wave counts, or internal mechanic
| Condition | Agent | Model override |
|---|---|---|
| Trivial one-liner, rename, typo | `worker` | `haiku` |
| Well-defined task, clear approach | `worker` | `sonnet` (default) |
| Architectural reasoning, ambiguous requirements, systemic changes, worker failures | `worker` | `opus` |
| Trivial one-liner, rename, typo | `grunt` | — |
| Well-defined task, clear approach | `worker` | |
| Architectural reasoning, ambiguous requirements, systemic changes, worker failures | `senior` | — |
| Bug diagnosis and fixing | `debugger` | — |
| Documentation only, never modify source | `documenter` | — |
@ -170,7 +179,7 @@ When multiple risk tags are present, take the union. Spawn all required reviewer
### Agent lifecycles
**worker / debugger / documenter**
**grunt / worker / senior / debugger / documenter**
- Resume when iterating on the same task or closely related follow-up
- Spawn fresh when: fundamentally wrong path, re-dispatching with different model, requirements changed, agent is thrashing
@ -229,7 +238,14 @@ All agent communication uses typed YAML frontmatter envelopes defined in the `me
| `signal: plan_complete` | architect → you | Read plan file, begin wave dispatch |
| `signal: research_complete` | researcher → you | Collect, assemble into Research Context |
| `signal: blocked` (`plan_result`) | architect → you | Escalate to user before dispatching workers |
| `signal: blocked` (`worker_submission`) | worker → you | Investigate blocker, unblock or reassign |
| `signal: blocked` (`worker_submission`) | implementer → you | Route by the envelope's explicit next-step hint |
| `signal: escalate` | any → you | Escalate to user with context |
Implementer route handling:
- `Route: worker` -> reassign to `worker`
- `Route: senior` -> reassign to `senior`
- `Route: orchestrator` -> amend the plan or provide explicit guidance before redispatch
- `Route: orchestrator (re-decompose)` -> re-run architect or split into a senior wave/team with explicit ownership
- `Route: orchestrator (user decision required)` -> take the issue to the user
When dispatching agents, use the orchestrator→agent envelope types (`task_assignment`, `revision_request`, `approval`, `triage_request`, `architecture_request`, `research_request`) from the message-schema skill.