pure.md — a markdown proxy for AI agents, with built-in MCP

Brève · 2 min read
🇫🇷 This article is also available in Français

Agents fetching the web fail for three reasons: bot detection, empty SPAs without JS hydration, and opaque PDFs. pure.md is a REST proxy that handles all three — and returns clean markdown, calibrated for LLMs.

The actual problem

An agent doing fetch("https://reuters.com/article/xyz") gets either a 403 or an empty HTML shell. React SPAs have no content without JavaScript. PDFs are binary. The alternatives — Jina AI and Tavily, two LLM-oriented web scraping services — do better, but at a token cost: 143K tokens for a Wikipedia article with Jina, 55K with Tavily, 28K with pure.md.

gantt
    title Token count — wikipedia.org/wiki/Artificial_intelligence
    dateFormat X
    axisFormat %sK
    section r.jina.ai
        143K tokens : 0, 143
    section tavily.com
        55K tokens  : 0, 55
    section pure.md
        28K tokens ✓ : 0, 28

pure.md runs a real headless browser to hydrate SPAs, rotates IPs and fingerprints to pass bot detection, and converts HTML/PDF/images into lean markdown via HTMLRewriter.

The API: one prefix, that’s it

# Before
GET https://en.wikipedia.org/wiki/Artificial_intelligence  # → full HTML, 143K tokens

# After
GET https://pure.md/en.wikipedia.org/wiki/Artificial_intelligence  # → clean markdown, 28K tokens

No SDK, no URL transformation. Just a prefix. The global cache skips redundant requests — if someone else already fetched that page, you get the cached version immediately.

POST handles structured extraction: pass a prompt and a JSON schema, a model runs on the page content (Llama 3.1 8B by default, up to DeepSeek R1 distilled 32B), and you get typed JSON back.

The MCP: two tools, four lines

Two tools are exposed: unblock-url to fetch a URL as markdown, search-web to run a query and concatenate results from the top pages.

Config in ~/.cursor/mcp.json (same format for Windsurf and Claude Desktop):

{
  "mcpServers": {
    "pure.md": {
      "command": "npx",
      "args": ["-y", "puremd-mcp"],
      "env": {
        "PUREMD_API_KEY": "<TOKEN>"
      }
    }
  }
}

Works anonymously (6 req/min without a key, 10 req/min logged in without a subscription). The free Starter plan starts at 60 req/min with pay-as-you-go pricing ($0.003/fetch, $0.005/search) — and comes with $1 in free credit, which is roughly 333 fetches before you spend anything. Plenty to test under real conditions. The Growth plan at $19/month goes up to 600 req/min with $20 in included credits — the API key is optional for exploring, required the moment you put this in production.

← Back to articles