Building an agent-first harness with OpenCode custom tools

Deep Dive · 6 min read
🇫🇷 This article is also available in Français

The agent reads the source code. Understands the bug. Proposes a fix. Commits. The bug is still there.

This isn’t a model problem — it’s a tooling problem. Without runtime access, the agent reasons like a developer locked in their editor: it can fix what it sees in the files. Not what’s happening in the browser.

On an Astro blog, this shows up in concrete ways: a Mermaid error that only surfaces at runtime in the browser, an MDX component that breaks rendering on a specific page, a performance regression invisible without measurement. The agent can inspect the MDX source and find nothing. The page is broken anyway.

The solution isn’t a better-worded prompt. It’s giving the agent the tools to observe the problem itself.

The problem: an agent with no eyes and no hands

A typical agent correction loop looks like this: read files, modify code, commit, wait for human feedback. Human feedback is the only signal it has about the real effect of its changes. Everything else is inference from source code.

This model has two structural flaws.

The first: the agent can’t distinguish “the code is correct” from “the application works.” A component can be syntactically valid, type-safe, passing the build — and produce a blank page because of an unescaped {} in a Mermaid block that triggers an acorn error at runtime. The build doesn’t catch it. Neither does the agent, if all it has is the build.

The second: the feedback loop is human. The agent fixes, the human checks, the human reports. Every cycle. On a blog with twelve articles and five custom components, that’s a lot of cycles.

What we want is a loop where the agent can itself verify that its fixes work. Not by reading code — by running the application, navigating the pages, capturing errors. An autonomous feedback loop.

Harness architecture: .opencode/tools/

OpenCode supports custom tools through a .opencode/tools/ directory. Each TypeScript file in that directory exposes tools the agent can call directly. The naming convention is <file>_<export>: a start export in blog_dev.ts becomes blog_dev_start in the agent’s available tools.

Three files, fifteen tools, one agent. That’s the harness:

graph TD
    subgraph project[Astro Project]
        subgraph tools[".opencode/tools/"]
            DEV["blog_dev.ts<br/>start · stop · status<br/>logs · http_get"]
            BUILD["blog_build.ts<br/>run · lint"]
            BROWSER["blog_browser.ts<br/>screenshot · perf · js_errors"]
        end
        subgraph agents[".opencode/agents/"]
            VALIDATOR["blog-validator.md"]
        end
    end

    VALIDATOR -->|invokes| DEV
    VALIDATOR -->|invokes| BUILD
    VALIDATOR -->|invokes| BROWSER

    DEV -->|spawn| ASTRO["astro dev<br/>(port 4321)"]
    BUILD -->|execSync| ASTROBUILD["astro build"]
    BROWSER -->|Playwright| CHROME["Chromium headless"]

    ASTRO -.->|HTTP| DEV
    CHROME -.->|navigates| ASTRO

The SDK is @opencode-ai/plugin. The tool() function acts as an identity function enriched with metadata: a text description the agent uses to decide when to call the tool, a Zod schema for the arguments, and the execute function. Unexported helpers — like httpGet or waitForServer — stay in the file and never appear in the agent’s tool list.

import { tool } from "@opencode-ai/plugin";

export const my_tool = tool({
  description: "What the agent needs to understand to decide when to call this tool.",
  args: {
    path: tool.schema.string().describe("The path to inspect"),
    port: tool.schema.number().optional().default(4321),
  },
  async execute(args, context) {
    // context.worktree = absolute path to the project root
    return JSON.stringify({ result: "..." });
  },
});

context.worktree is the key piece: the absolute path to the project directory, automatically injected by OpenCode. All file paths, spawn cwd values, dynamic require() calls — everything starts from there.

Dev server tools (blog_dev.ts)

Five tools cover the dev server lifecycle: start, stop, status, logs, http_get.

start is the most instructive. The Astro dev server needs to run in the background while the agent works — it can’t block execution. The solution: spawn with detached: true, stdout/stderr redirected to a log file, child.unref() so the process survives its parent.

export const start = tool({
  description:
    "Starts the Astro dev server in the background. " +
    "Captures stdout/stderr in .opencode/dev-server.log. Returns the PID and port.",
  args: {
    port: tool.schema.number().optional().default(4321),
  },
  async execute(args, context) {
    const port = args.port ?? 4321;
    const opencodeDir = path.join(context.worktree, ".opencode");
    const pidFile = path.join(opencodeDir, "dev-server.pid");
    const logFile = path.join(opencodeDir, "dev-server.log");

    // Reset the log
    fs.writeFileSync(logFile, "", "utf-8");

    const logFd = fs.openSync(logFile, "a");
    const child = spawn("npm", ["run", "dev", "--", "--port", String(port)], {
      cwd: context.worktree,
      detached: true,
      stdio: ["ignore", logFd, logFd],
    });
    fs.closeSync(logFd);

    const pid = child.pid!;
    fs.writeFileSync(pidFile, String(pid), "utf-8");
    child.unref();

    const reached = await waitForServer(port);
    return JSON.stringify({ pid, port, status: reached ? "started" : "started_unreachable" });
  },
});

The PID file at .opencode/dev-server.pid lets the other tools (stop, status) find the process. waitForServer does HTTP polling on localhost:${port}/ with exponential backoff — it waits up to 15 seconds for the server to actually respond before returning control.

http_get is the most frequently used tool in practice. It lets the agent check that a route returns 200, verify a page isn’t empty, detect unexpected redirects:

export const http_get = tool({
  description: "Makes an HTTP GET request to the local dev server.",
  args: {
    path: tool.schema.string(),
    port: tool.schema.number().optional(),
  },
  async execute(args, context) {
    const portFile = path.join(context.worktree, ".opencode", "dev-server.port");
    const port = args.port ?? (fs.existsSync(portFile)
      ? parseInt(fs.readFileSync(portFile, "utf-8").trim(), 10)
      : 4321);

    const result = await httpGet(`http://localhost:${port}${args.path}`);
    const bodyExcerpt = result.body.slice(0, 500);
    return JSON.stringify({
      status: result.status,
      contentType: result.contentType,
      bodyLength: result.body.length,
      bodyExcerpt,
    });
  },
});

The port is read from .opencode/dev-server.port if not provided — the agent doesn’t need to remember what port the server started on.

Build & lint tools (blog_build.ts)

Two tools: run for the full Astro build, lint for the content linters.

run uses execSync rather than spawn. That’s intentional: a build is synchronous by nature and must complete before the agent can interpret the results. The timeout is 120 seconds — more than enough for an Astro blog, never hit in practice.

export const run = tool({
  description:
    "Runs the full Astro build and returns metrics: " +
    "duration, pages generated, dist/ size, errors and warnings.",
  args: {},
  async execute(_args, context) {
    const startTime = Date.now();
    let success = false;
    let output = "";

    try {
      output = execSync("npm run build", {
        cwd: context.worktree,
        env: {
          ...process.env,
          PATH: `${process.env.PATH ?? ""}:/usr/local/bin:/opt/homebrew/bin`,
        },
        encoding: "utf8",
        timeout: 120000,
      });
      success = true;
    } catch (e: any) {
      output = (e.stdout ?? "") + (e.stderr ?? e.message ?? String(e));
      success = false;
    }

    const duration = ((Date.now() - startTime) / 1000).toFixed(1);
    // ...metric extraction from output
    return JSON.stringify({ success, duration, pageCount, distSize, errors, warnings });
  },
});

The tool returns structured metrics: duration, number of generated pages, dist/ directory size, and the last lines of the build output so the agent can read Astro errors directly. No log hunting — everything is in the response.

Conseil

The Astro build fails Zod validation if an unknown tag or article type is used in frontmatter. This is the only place these errors surface in an actionable way — not in the dev server, only in the build.

Browser tools (blog_browser.ts)

Three Playwright tools: screenshot, perf, js_errors. This is where the harness gets genuinely interesting.

Playwright is loaded dynamically from the project’s node_modules, not from a global install:

function loadPlaywright(worktree: string) {
  try {
    return require(path.join(worktree, "node_modules", "@playwright/test"));
  } catch (e) {
    throw new Error(
      `Playwright not installed. Run: npm install -D @playwright/test && npx playwright install chromium\n${e}`
    );
  }
}

Why this approach? OpenCode runs in its own Node context, which doesn’t have access to local node_modules via standard require(). Building the absolute path to node_modules/@playwright/test ensures we use exactly the version installed in the project, with the matching Chromium binaries.

js_errors is the most useful tool for agent autonomy:

export const js_errors = tool({
  description:
    "Captures JS runtime errors and console messages from a page. " +
    "Useful for detecting acorn/MDX errors, 404 resources, Astro warnings.",
  args: {
    path: tool.schema.string(),
    port: tool.schema.number().optional().default(4321),
    includeWarnings: tool.schema.boolean().optional().default(false),
  },
  async execute(args, context) {
    const { chromium } = loadPlaywright(context.worktree);
    const browser = await chromium.launch({ headless: true });
    const page = await browser.newPage();

    const errors: string[] = [];
    const warnings: string[] = [];
    const failedRequests: string[] = [];

    page.on("console", (msg) => {
      if (msg.type() === "error") errors.push(msg.text());
      if (msg.type() === "warning" && args.includeWarnings) warnings.push(msg.text());
    });
    page.on("pageerror", (err) => errors.push(err.message));
    page.on("requestfailed", (req) =>
      failedRequests.push(`${req.method()} ${req.url()} — ${req.failure()?.errorText}`)
    );

    const url = `http://localhost:${args.port}${args.path}`;
    await page.goto(url, { waitUntil: "networkidle" });
    await browser.close();

    return JSON.stringify({
      url,
      errors,
      warnings,
      failedRequests,
      summary: errors.length === 0 ? "no errors" : `${errors.length} error(s) detected`,
    });
  },
});

Three listeners cover the three error types that matter to the agent:

  • console error — runtime JavaScript errors, framework warnings
  • pageerror — uncaught exceptions, parse errors
  • requestfailed — 404 resources, unloaded CSS/JS, missing images

pageerror is what catches the acorn errors generated by unescaped {} in Mermaid blocks — those that neither the build nor the dev server surface, but silently break client-side rendering.

perf measures navigation metrics via the PerformanceNavigationTiming API: TTFB, domInteractive, domContentLoaded, load time. Useful for detecting a regression after a layout change or font import.

screenshot saves captures to .opencode/screenshots/ with a filename based on the path and a timestamp. The agent can visually see what a page renders — useful for verifying a component displays correctly, not just that it doesn’t throw.

Putting it together: the blog-validator agent

Tools in isolation are infrastructure. What makes them useful is the agent that orchestrates them.

sequenceDiagram
    participant H as Human / CI
    participant V as blog-validator
    participant DEV as blog_dev_*
    participant BUILD as blog_build_*
    participant BR as blog_browser_*

    H->>V: Validate the blog after changes
    V->>DEV: blog_dev_status
    DEV-->>V: { running: false }
    V->>DEV: blog_dev_start
    DEV-->>V: { pid: 12345, port: 4321, status: "started" }
    V->>BUILD: blog_build_lint
    BUILD-->>V: { content: { success: true }, mermaid: { success: true } }
    V->>DEV: blog_dev_http_get("/")
    DEV-->>V: { status: 200, bodyLength: 112000 }
    V->>DEV: blog_dev_http_get("/en/")
    DEV-->>V: { status: 200, bodyLength: 112000 }
    V->>BR: blog_browser_js_errors("/")
    BR-->>V: { errors: [], summary: "no errors" }
    V->>BR: blog_browser_perf("/articles/my-article")
    BR-->>V: { ttfb: 42, load: 180 }
    V-->>H: Verdict PASS / WARN / FAIL

The blog-validator standard workflow:

  1. blog_dev_status — server running? If not, blog_dev_start
  2. blog_build_lint — content linters before testing routes
  3. blog_dev_http_get on critical routes (/, /en/, /articles/, /en/articles/)
  4. blog_browser_js_errors on the FR homepage — catches runtime errors invisible to the build
  5. Verdict PASS / WARN / FAIL with actionable details

What this changes in practice: the agent can now fix a Mermaid error, verify the error is gone with js_errors, confirm the page still loads in under 200ms with perf, and commit. No human in the loop.

The full cycle — modify, run, observe, fix — runs autonomously. Humans only get involved for FAIL verdicts that require an architectural decision, not for every routine check.

Note

The blog-validator runs in subagent mode in OpenCode — invocable from other agents or from the main chat. You can run it manually after an editing session, or wire it into a post-commit hook for systematic validation.


There’s nothing magic about this harness. It’s fifteen tools, a few hundred lines of TypeScript, and an agent with a clear workflow. What makes it effective is the completeness of the coverage: execution, HTTP observation, log analysis, browser rendering, performance metrics. The agent doesn’t need to guess anymore — it can look.

← Back to articles