opencode-team-lead: an agent that plans, delegates, and never writes code
A session that starts clean. The first three tasks land properly, the code is solid, tests pass. Then, somewhere around message 30, something shifts. The agent asks a question you already answered twenty messages ago. It generates code that contradicts what it wrote an hour earlier. And when you ask it to review that code, it tells you everything looks great.
This isn’t a model problem. It’s a design problem.
I built opencode-team-lead to fix exactly this — first for my own development sessions, then published it. The plugin evolves continuously based on real-world feedback. This article covers v0.8.0.
What actually breaks in long sessions
Two mechanisms kick in, independently of each other.
The first: context saturation. The longer a session runs, the more the window fills up — file reads, tool outputs, debug back-and-forths, agent responses. In OpenCode, like any serious agentic environment, compactions trigger automatically when the context approaches its limit. A summary replaces the full history. The agent starts fresh. But the decisions made at the beginning of the session — architecture choices, identified constraints, negotiated interfaces — aren’t in the summary. They’re not lost if someone wrote them down somewhere. Nobody wrote them down.
The second is more insidious: an agent reviewing its own code isn’t doing code review. It’s validating itself. The same LLM that generated the code evaluates it with the same implicit assumptions, the same blind spots, the same confirmation bias. It doesn’t catch design errors because it designed the same way. It doesn’t question implementation choices because it made them. The review invariably comes back LGTM — because it’s a self-portrait asking for its own opinion on its appearance.
Both problems have the same outcome: a long session with an all-in-one agent drifts. Not sometimes. Consistently.
The team-lead pattern: separating thinking from doing
The answer isn’t better prompting. It’s changing who does what.
The pattern borrows from an organizational reality we know well: in a functioning human team, the person who plans and delegates isn’t the person who executes. The lead doesn’t code — they understand the problem, break down the work, dispatch tasks, control quality. Execution belongs to others. This separation doesn’t exist by convention — it exists because the cost of a lead getting into the weeds is too high: they lose the big picture, they get invested in the implementation, and they can no longer evaluate it objectively.
Applied to agents, this gives you an orchestrator that follows one cardinal rule: never read or modify code directly.
And this isn’t a prompt directive. It’s enforced by permissions. The orchestrator literally cannot call read, edit, write, grep, or glob on the codebase. These tools aren’t in its allowed permissions. It only has access to task for delegation, todowrite/todoread for tracking, and a few read-only git commands to check the state of the repo.
The difference between “won’t do” and “can’t do” isn’t trivial. A prompt that says “don’t read directly” can be forgotten, circumvented, or ignored at the wrong moment. A denied permission is deterministic.
This approach is directly related to the pattern Anthropic documented in their harness engineering work: strict generator/evaluator separation. The difference here is that the separation is enforced at the permission level rather than relying on the model’s expected behavior.
Architecture: an orchestrator and its delegates
The plugin is called opencode-team-lead. It registers a team-lead agent in OpenCode along with a full review pipeline.
graph TD
User["User"] --> TL["team-lead agent\n(Orchestrator)"]
TL -->|"task(explore)"| EX["explore\n(Code reading)"]
TL -->|"task(general)"| GEN["general\n(Execution)"]
TL -->|"task(review-manager)"| RM["review-manager\n(Invisible sub-agent)"]
RM -->|Parallel| RR["requirements-reviewer"]
RM -->|Parallel| CR["code-reviewer"]
RM -->|Parallel| SR["security-reviewer"]
TL -.->|"read/edit only"| SP[".opencode/scratchpad.md"]The orchestrator follows a five-phase cycle: Understand → Plan → Delegate → Review → Synthesize. The Understand phase always starts by reading the scratchpad — to pick up existing state if there is any. The Plan phase uses sequential-thinking to break down the work and todowrite to make it visible. The Delegate phase dispatches each task to the most appropriate agent via task, with a complete brief: relevant file paths, constraints, expected return format. The Review phase always delegates to the review-manager. The Synthesize phase collects results and presents a summary to the user.
The permission model is deny-by-default. Anything not explicitly allowed is blocked. The orchestrator can use DCP tools (distill, prune, compress from opencode-dynamic-context-pruning) to manage its own context throughout the session — that’s the only internal operation it performs. Everything else goes through delegates.
The scratchpad: memory that survives compactions
The compaction problem deserves a closer look. In a long session on a complex project, compaction is inevitable. And when it triggers, automatic summaries are unreliable on exactly what needs to be preserved: the task_ids of in-flight delegations, architectural decisions made mid-session, interfaces negotiated with executing agents.
The solution is low-tech — intentionally.
.opencode/scratchpad.md is a Markdown file in the repo. The orchestrator writes its current state there after every significant step: active mission, structured plan with statuses (pending / in_progress / done / blocked), list of active delegations with their task_ids, and a technical summary sufficient to resume without ambiguity. It’s the only file the orchestrator is allowed to read and write.
The plugin implements OpenCode’s experimental.session.compacting hook. When a compaction triggers, the plugin reads the scratchpad and injects it verbatim into the new context with an explicit header. The agent resumes with its state intact.
sequenceDiagram
participant OC as OpenCode
participant Plugin as opencode-team-lead
participant SP as .opencode/scratchpad.md
participant Agent as team-lead agent
Agent->>SP: Updates state\n(plan, tasks, task_ids)
OC->>Plugin: Compaction triggered
Plugin->>SP: Reads content
Plugin->>OC: Injects scratchpad\ninto new context
OC->>Agent: New context\nwith preserved stateIt’s not the most elegant mechanism. But it’s deterministic where automatic summaries are probabilistic. What matters for resuming a session is having the right task_ids and the right constraints — not a well-worded summary.
If OpenCode hasn’t triggered a compaction yet and you want to force the agent to write to the scratchpad, you can explicitly ask it to “update its scratchpad” before a long break. This file also works as a clean resume point if you pick up a mission the next day.
An independent review pipeline
The other half of the problem. The orchestrator never reviews directly — it delegates that responsibility to the review-manager, a sub-agent that runs in invisible mode within OpenCode’s UI (subagent mode: it doesn’t appear in the agent list, it only exists to be invoked).
The review-manager launches three specialized reviewers in parallel, each with their own fresh context and a distinct scope:
- requirements-reviewer: does the implementation match what was asked? No judgment on the code — functional compliance only.
- code-reviewer: code quality, patterns, maintainability, consistency with the existing codebase.
- security-reviewer: vulnerabilities, exposed configurations, attack surface introduced by the change.
Each reviewer produces an independent verdict with justification. The review-manager confronts the three verdicts and synthesizes them into a single one: APPROVED, CHANGES_REQUESTED, or BLOCKED. The protocol is strict: APPROVED → the orchestrator moves forward; CHANGES_REQUESTED → corrections are re-delegated to the producer with the specific requests; BLOCKED → immediate escalation to the human. Maximum two rounds — beyond that, the human takes over.
What fundamentally changes compared to the all-in-one agent: the reviewers didn’t write the code. They arrive without production context, without investment in the choices made, without confirmation bias. The requirements-reviewer can note that the feature doesn’t cover the main use case without feeling compelled to minimize the gap. The security-reviewer can flag an attack surface without having to defend the implementation.
That’s structurally what self-review cannot do.
Getting started
Three lines in opencode.json:
{
"plugin": [
"opencode-team-lead@latest",
"@tarquinen/opencode-dcp@latest"
],
"default_agent": "team-lead"
}The DCP dependency is required. The orchestrator actively uses it to keep its own context clean between delegations. Without it, the orchestrator’s context also grows indefinitely with the verbose outputs from delegated agents — the same outputs that created the problem in the first place.
"default_agent": "team-lead" is strongly recommended. Without it, every new session starts with OpenCode’s general agent, and the orchestrator only gets invoked if you select it manually — which you’ll consistently forget to do after three days.
Permissions are extensible without breaking the defaults. If you want the orchestrator to be able to do web searches or run tests:
{
"agent": {
"team-lead": {
"permission": {
"webfetch": "allow",
"bash": {
"npm test": "allow"
}
}
}
}
}The mergePermissions function combines your overrides with the plugin’s default permissions. The orchestrator keeps its existing git rights and additionally gets access to what you add. Overrides don’t replace defaults — they extend them.
The repo is on GitHub (azrod/opencode-team-lead), full documentation at azrod.github.io/opencode-team-lead.
The plugin is primarily tested and developed with Claude Sonnet 4.6 thinking. Thinking mode improves the quality of the Plan and Delegate phases, which involve non-trivial problem decomposition. Other models will work, but results may vary — particularly around how strictly the cardinal rule is respected.
The team-lead pattern isn’t the answer to every agent problem. On a short, well-defined task, the delegation overhead isn’t worth much — a general agent doing the work directly is faster. Where the pattern earns its keep is on long missions involving multiple distinct phases, multiple files, multiple architectural decisions. That’s where compactions cause real damage, where self-review becomes dangerous, and where an orchestrator that never touches code — but knows exactly who to send and why — actually makes a difference.