Your agents don't need memory
Open a new agent session. Zero context, zero history. A new contractor showing up on your project for the first time — and they’ll be replaced by an identical contractor next session.
What you put in the next few tokens will shape everything they produce. Two philosophies collide here: give them memory — rules, past decisions, errors to avoid — or give them a harness — a codebase that mechanically encodes its own constraints.
The first approach feels natural. The second is the only one that scales.
Every session is a new developer on day one
The fundamental problem with AI agents in a development context is that we anthropomorphize their relationship with time. We talk about “memory”, “learning”, “what the agent has picked up”. As if experience accumulated in one session carried over to the next.
It doesn’t. Every new session starts from scratch. The agent doesn’t “remember” that last time it tried entry.render() and it blew up in production. It doesn’t “know” that two months of debate landed on using a glob loader with generateId. Those things exist somewhere in logs, PRs, discussions — but they’re not in its context, so they don’t exist for it.
This is exactly the situation of a new human developer joining your 1M-line project. No history, no implicit context. How do you onboard them?
If your answer is an 80-page document of “things to know and not do”, you’re going to have two problems: nobody actually reads that, and even the ones who do only retain a fraction — which will be stale after the first significant refactoring. What works is a codebase with a green CI, a configured linter, tests that describe expected behavior. Documentation exists to explain the why, not to compensate for the absence of mechanical constraints.
Same thing for agents. Except we’ve systematically taken the wrong path.
Memory as attention debt
The context window is a finite, non-uniform resource. Finite, because even million-token models have a limit. Non-uniform, because an LLM’s attention on a given segment of its context degrades with distance — the “lost in the middle problem” is empirically documented on most current models. A rule buried at token 800k in a saturated context is statistically ignored, regardless of model quality.
Every rule you inject at the start of a session is a token not allocated to the actual task. And as the session progresses — exchanges, file reads, tool outputs, generated code — those rules mechanically recede in the attention window. They don’t disappear, but their effective weight shrinks.
The math is brutal on a real project: if your AGENTS.md has 200 lines of rules and the agent is working on a feature involving 5 files, 3 tool calls each outputting 300 lines, and a few debug rounds, you’re quickly in a context where the rules from the top don’t compete with the immediate operational signal.
That’s not a bug. It’s the nature of an attention-limited system. The real question is: why rely on attention to enforce rules, when deterministic mechanisms exist for exactly that?
Larger context windows don’t solve this problem — they shift it. A 2M-token model will still have attention degradation on segments far from the active signal.
The binary test
For every piece of knowledge you’d want your agent to “retain”, ask yourself one question:
Can this knowledge be verified in a binary way — pass or fail — without human intervention?
If yes, turn it into a verifiable artifact. That’s not memory, it’s a mechanical constraint. If no, it’s narrative context — it has its place, but as an exception, not the default.
Here’s what that looks like in practice:
| What goes into AGENTS.md | Binary artifact equivalent |
|---|---|
”Never use entry.render() in Astro 6” | Lint rule + explicit error message |
”Always create both slug.fr.mdx AND slug.en.mdx” | lint:content script that checks for pairs |
”Use pnpm, not npm” | package.json engines + .npmrc |
”This function returns null on empty input” | Type null | T + unit test |
”No subdirectories inside content/posts/” | Structure validation script |
| ”Silent build error on nested routes” | Regression test |
Every row represents something that would have gone in an AGENTS.md. Each one has a mechanical equivalent that doesn’t depend on an agent’s attention during a session.
flowchart LR
A["New rule\nor decision"] --> B{"Binary test"}
B -->|"Mechanically\nverifiable"| C["Binary artifact"]
B -->|"Not verifiable"| D["Exception\ndocumentation"]
C --> E["Lint rule"]
C --> F["Test / CI check"]
C --> G["Type / contract"]
C --> H["Validation script"]
D --> I["Index in AGENTS.md\npointing to spec"]
style C fill:#22c55e,color:#fff
style D fill:#f59e0b,color:#fff
style I fill:#f59e0b,color:#fff
The green branch is deterministic, always up to date, consumes zero context tokens. The yellow branch is necessary but should stay rare and lean.
The codebase as exhaustive source of truth
This isn’t just a tooling question. It’s an architectural philosophy: in a well-maintained project, the codebase encodes the constraints that govern its own evolution. That’s the harness.
A harness is the set of automated mechanisms that form the project’s safety net: strict types, linters with custom rules, unit and integration tests, validation scripts, CI pipeline. For an agent arriving at session 0, this harness is the only memory that actually matters. Three reasons.
It’s always up to date. A lint rule blocking entry.render() will be true today and in six months. A line in AGENTS.md saying the same thing can become false after a refactoring — and nobody will update the doc, because nobody knows it’s become obsolete.
It’s objective. The CI passes or it doesn’t. No interpretation, no limited attention, no rule buried in context. The agent tries something, the linter blocks, it fixes. The feedback loop is immediate.
It scales. A 1M-line project with 300 rules in an AGENTS.md is unmanageable for an agent and a human alike. A 1M-line project with 300 mechanically-encoded constraints is simply a well-maintained project.
flowchart TD
Agent["Agent in session"] -->|"Attempts a change"| Code["Codebase"]
Code --> TS["TypeScript strict"]
Code --> Lint["ESLint + custom rules"]
Code --> Tests["Unit and integration\ntests"]
Code --> Scripts["Validation scripts"]
Code --> CI["CI pipeline"]
TS -->|"Type error"| Agent
Lint -->|"Blocking violation"| Agent
Tests -->|"Test failing"| Agent
Scripts -->|"Check failed"| Agent
CI -->|"Red build"| Agent
The agent doesn’t need to “know” the rules upfront. It discovers them when it breaks them — immediate feedback, zero upfront context cost.
The harness is especially powerful for past errors. The moment an agent — or a human — introduces a bug that ships to production, the first thing to do is write a regression test. That error can never be “forgotten” again — it’s encoded in the project permanently.
The documented best practices paradox
The teams that do the best job documenting their rules in AGENTS.md files are often the ones with the best engineering instincts. They correctly identify dangerous patterns, recurring mistakes, critical conventions. That’s the paradox.
But by putting them in a document rather than in code, they’re doing the exact opposite of what their instincts should tell them: creating an unverifiable rule where a verifiable one was possible.
It’s a form of cognitive technical debt. Documentation replaces the work of mechanical encoding because it’s faster — writing “never do X” takes thirty seconds, writing a lint rule takes thirty minutes. But that debt is paid every session an agent ignores the rule because it was too far down the context.
The right heuristic: if a rule deserves to be in AGENTS.md, it deserves even more to be in the harness. AGENTS.md is the fallback for what resists mechanization — not the default solution.
AGENTS.md as index, not encyclopedia
None of this means AGENTS.md is useless. It just has the wrong job.
Here’s what an AGENTS.md with the wrong philosophy looks like:
# Important rules
- Never use entry.render() — use render(entry) from astro:content
- Always create both .fr.mdx and .en.mdx files
- No subdirectories in src/content/posts/
- Use relative paths for imports in MDX files
- Slug must be identical across both language versions
- Mermaid via fenced code blocks only, never as a JSX component
... [40 more rules]
Here’s what it should look like:
# Project navigation
## Architecture and technical decisions
Architecture decisions and their rationale live in `docs/adr/`.
Review existing ADRs before proposing structural changes.
## Code conventions
Encoded in `eslint.config.js`. The linter is the source of truth — not this file.
Custom rules documented in `docs/specs/lint-rules.md`.
## Content / posts
Structure and conventions: `docs/specs/content-structure.md`
Validation: `npm run lint:content`
## Tests
Expected patterns: `docs/specs/testing.md`
Coverage thresholds: defined in `vitest.config.ts`
This file fits in 20 lines. It doesn’t ask the agent to retain anything — it tells it where to go when it needs something. And what it finds at those destinations is specific, current, and usually backed by binary artifacts.
The fundamental difference: an index consumes a handful of tokens at the start of a session to orient the agent. An encyclopedia consumes hundreds of tokens to create an illusion of knowledge that degrades as context fills up.
The right test for your AGENTS.md: does every line point to something verifiable, or does it ask the agent to “remember” something? Lines in the second category belong in the harness, not this file.
What actually changes
Shifting from the memory approach to the harness approach changes how you work with agents.
Work moves upstream. Identifying that a rule should exist and writing it as a lint rule or test takes more time than jotting it in a doc. It’s an investment — one that pays off from the second agent session that would have hit the same pattern.
Errors become constructive rather than repetitive. When an agent violates a constraint and the CI blocks, that error is handled once and for all. Without a harness, the same error repeats every session. The agent doesn’t forget — the rule was in a doc nobody thought to front-load into the context.
And your relationship with AGENTS.md changes. Instead of a file you’re always worried isn’t comprehensive enough, it becomes an index you consult when you need to find something. The cognitive load shifts from “know everything upfront” to “know where to look when needed” — which is exactly how good developers navigate complex projects.
On a 1M-line project, the only reliable memory is the kind encoded in code. Everything else is noise that decays over time.