pure.md — a markdown proxy for AI agents, with built-in MCP
Agents fetching the web fail for three reasons: bot detection, empty SPAs without JS hydration, and opaque PDFs. pure.md is a REST proxy that handles all three — and returns clean markdown, calibrated for LLMs.
The actual problem
An agent doing fetch("https://reuters.com/article/xyz") gets either a 403 or an empty HTML shell. React SPAs have no content without JavaScript. PDFs are binary. The alternatives — Jina AI and Tavily, two LLM-oriented web scraping services — do better, but at a token cost: 143K tokens for a Wikipedia article with Jina, 55K with Tavily, 28K with pure.md.
gantt
title Token count — wikipedia.org/wiki/Artificial_intelligence
dateFormat X
axisFormat %sK
section r.jina.ai
143K tokens : 0, 143
section tavily.com
55K tokens : 0, 55
section pure.md
28K tokens ✓ : 0, 28pure.md runs a real headless browser to hydrate SPAs, rotates IPs and fingerprints to pass bot detection, and converts HTML/PDF/images into lean markdown via HTMLRewriter.
The API: one prefix, that’s it
# Before
GET https://en.wikipedia.org/wiki/Artificial_intelligence # → full HTML, 143K tokens
# After
GET https://pure.md/en.wikipedia.org/wiki/Artificial_intelligence # → clean markdown, 28K tokensNo SDK, no URL transformation. Just a prefix. The global cache skips redundant requests — if someone else already fetched that page, you get the cached version immediately.
POST handles structured extraction: pass a prompt and a JSON schema, a model runs on the page content (Llama 3.1 8B by default, up to DeepSeek R1 distilled 32B), and you get typed JSON back.
The MCP: two tools, four lines
Two tools are exposed: unblock-url to fetch a URL as markdown, search-web to run a query and concatenate results from the top pages.
Config in ~/.cursor/mcp.json (same format for Windsurf and Claude Desktop):
{
"mcpServers": {
"pure.md": {
"command": "npx",
"args": ["-y", "puremd-mcp"],
"env": {
"PUREMD_API_KEY": "<TOKEN>"
}
}
}
}Works anonymously (6 req/min without a key, 10 req/min logged in without a subscription). The free Starter plan starts at 60 req/min with pay-as-you-go pricing ($0.003/fetch, $0.005/search) — and comes with $1 in free credit, which is roughly 333 fetches before you spend anything. Plenty to test under real conditions. The Growth plan at $19/month goes up to 600 req/min with $20 in included credits — the API key is optional for exploring, required the moment you put this in production.