v1 · stable

CrawlGraph API

A small HTTP API for programmatic backlink lookups, release discovery, and gap-analysis jobs. JSON in, JSON out. Bearer-token auth. Designed to fit into n8n, scripts, dashboards - anywhere you'd write five lines of code instead of clicking.

Available to lifetime-tier customers. Free accounts get zero API calls - pick up a lifetime licence on the landing page if you don't have one yet.

1. Quickstart

Three lines of curl. Replace the token with your own from the account page.

curl -X POST https://crawlgraph.com/api/v1/backlinks \
  -H "Authorization: Bearer cg_live_…" \
  -H "Content-Type: application/json" \
  -d '{"domain": "example.com"}'

Response (excerpt)

{
  "domain": "example.com",
  "release_id": "CC-MAIN-2026-04",
  "release_label": "Apr 2026",
  "total_linking_domains": 4821,
  "returned": 1000,
  "results": [
    { "linking_domain": "blog.foo.com", "num_hosts": 12, "tld": "com",
      "cg_authority": 84, "cg_rank": 1421 },
    { "linking_domain": "news.bar.org", "num_hosts": 7,  "tld": "org",
      "cg_authority": 71, "cg_rank": 9402 }
  ]
}

2. Authentication

Every request to /api/v1/* needs a bearer token in the Authorization header:

Authorization: Bearer cg_live_<your-key>

Keys are prefixed with cg_live_ and roughly 52 characters long.
Get a key → from your account page. You can have up to 10 active keys per user and label each one (e.g. production, n8n-bot).
The full key is shown only once at creation. If you lose it, revoke and create a new one - there's no recovery path.
All /api/v1/* endpoints require a valid, non-revoked key tied to an active lifetime user.

3. MCP server (Claude, Cursor, ...)

crawlgraph-mcp is the official Model Context Protocol server that wraps this API, so any MCP client - Claude Desktop, Claude Code, Cursor, Cline, Zed, Windsurf - can run backlink lookups and competitor gap analysis without you writing a line of HTTP. It is open source (MIT) and published on npm.

Install (Claude Desktop / Claude Code)

Add this to your MCP config (claude_desktop_config.json, or .mcp.json for Claude Code). Cursor, Cline, Zed and Windsurf take the same shape.

{
  "mcpServers": {
    "crawlgraph": {
      "command": "npx",
      "args": ["-y", "crawlgraph-mcp"],
      "env": {
        "CRAWLGRAPH_API_KEY": "cg_live_<your-key>"
      }
    }
  }
}

Tools

backlinks - referring domains for a target, with authority scores.
gap_analysis - domains linking to your competitors but not to you.
gap_outreach_targets - the warm-outreach play: the domains that link to all of your competitors but not to you, de-noised (platforms and CDNs filtered) and ranked by authority. The warmest backlink targets you will ever pitch.
releases - list the Common Crawl snapshots you can query.

Then just ask your assistant

Once it is connected you describe the goal in plain language and the agent runs the tools for you:

"Use gap_outreach_targets for mydomain.com against
competitor-a.com and competitor-b.com, then draft a short
outreach email to each priority target."

Same auth and quotas as the HTTP API - the MCP server is a thin client over the endpoints documented below. Source, issues and the full tool reference live at github.com/pucilpet/crawlgraph-mcp.

4. Quotas & rate limits

Resource	Monthly quota	Counter
backlinks calls	1,000 / mo	per user
gap-analysis jobs	50 / mo	per user
releases lookups	unlimited	not counted

Window is the calendar month in UTC. Hard reset on the 1st at 00:00 UTC - no rollover.
Only successful (2xx) calls count. Validation errors, auth failures, and quota rejections are free.
Failed gap jobs do not refund quota in v1. If something looks wrong, email support and quote the request_id.
A separate IP-based limiter caps bursts at roughly 60 requests per minute on /api/v1/* to protect the backend.

Response headers

Every 2xx response (and 429s) carries these headers so your client can pace itself:

Header	Meaning
X-RateLimit-Limit-Backlinks	Monthly cap (1000).
X-RateLimit-Remaining-Backlinks	Calls left this month.
X-RateLimit-Limit-Gap	Monthly gap-job cap (50).
X-RateLimit-Remaining-Gap	Gap jobs left this month.
X-RateLimit-Reset	Unix timestamp of the next month rollover.
X-Request-ID	Echo this on support tickets.
Retry-After	Seconds until quota resets. Sent only on 429.

5. Errors

Every non-2xx response uses the same envelope:

{
  "error": "<code>",
  "message": "<human readable>",
  "request_id": "req_a1b2c3d4"
}

Code	Status	Meaning
auth_missing	401	Authorization header missing or malformed.
auth_invalid	401	Key unknown, revoked, or owner refunded.
quota_exceeded	429	Monthly quota hit; check Retry-After.
validation_error	400	Request body or query failed validation.
not_found	404	Resource doesn't exist or isn't yours.
internal_error	500	Server bug - quote the request_id.

6. Endpoints

POST /api/v1/backlinks

POST/api/v1/backlinks

Synchronous backlink lookup for a single domain. Counts against the backlinks quota.

Request body

{
  "domain": "example.com",
  "release_id": "CC-MAIN-2026-04",   // optional; default = latest
  "limit": 1000,                      // optional; default 1000, max 10000
  "sort": "authority"                 // optional; "authority" (default) | "hosts"
}

Response

{
  "domain": "example.com",
  "release_id": "CC-MAIN-2026-04",
  "release_label": "Apr 2026",
  "total_linking_domains": 4821,
  "returned": 1000,
  "results": [
    { "linking_domain": "blog.foo.com", "num_hosts": 12, "tld": "com",
      "cg_authority": 84, "cg_rank": 1421 },
    { "linking_domain": "news.bar.org", "num_hosts":  7, "tld": "org",
      "cg_authority": 71, "cg_rank": 9402 }
  ]
}

Notes

limit caps at 10,000. The API is for programmatic use, not bulk export - use the dashboard for full datasets.
Field names match the internal service: linking_domain, num_hosts, tld. No rename layer.
cg_authority is a 0-100 log-rank percentile derived from Common Crawl's harmonic centrality (higher = more authoritative). cg_rank is the raw PageRank position across the whole graph (1 = top-ranked domain). Both are null for domains that don't appear in the ranks file.
sort="authority" (default) orders by cg_authority DESC then num_hosts DESC; sort="hosts" preserves the legacy num_hosts DESC order.
Malformed domain, unknown release_id, or out-of-range limit → 400 validation_error.

curl

curl -X POST https://crawlgraph.com/api/v1/backlinks \
  -H "Authorization: Bearer cg_live_…" \
  -H "Content-Type: application/json" \
  -d '{"domain": "example.com", "limit": 500}'

Python (requests)

import os, requests

r = requests.post(
    "https://crawlgraph.com/api/v1/backlinks",
    headers={"Authorization": f"Bearer {os.environ['CG_KEY']}"},
    json={"domain": "example.com", "limit": 500},
    timeout=30,
)
r.raise_for_status()
data = r.json()
print(data["total_linking_domains"], "linking domains")

Node (fetch)

const res = await fetch("https://crawlgraph.com/api/v1/backlinks", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.CG_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ domain: "example.com", limit: 500 }),
});
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const data = await res.json();
console.log(data.total_linking_domains, "linking domains");

GET /api/v1/releases

GET/api/v1/releases

List the Common Crawl releases CrawlGraph has indexed. Read-only, not counted against quota - this is the caller learning what to pass.

Response

{
  "releases": [
    { "id": "CC-MAIN-2026-04", "label": "Apr 2026", "available": true },
    { "id": "CC-MAIN-2025-50", "label": "Dec 2025", "available": true }
  ]
}

curl

curl https://crawlgraph.com/api/v1/releases \
  -H "Authorization: Bearer cg_live_…"

Python (requests)

import os, requests

r = requests.get(
    "https://crawlgraph.com/api/v1/releases",
    headers={"Authorization": f"Bearer {os.environ['CG_KEY']}"},
    timeout=30,
)
r.raise_for_status()
for rel in r.json()["releases"]:
    print(rel["id"], rel["label"])

Node (fetch)

const res = await fetch("https://crawlgraph.com/api/v1/releases", {
  headers: { Authorization: `Bearer ${process.env.CG_KEY}` },
});
const { releases } = await res.json();
for (const r of releases) console.log(r.id, r.label);

POST /api/v1/gap-analysis · GET /api/v1/gap-analysis/{job_id}

POST/api/v1/gap-analysis

GET/api/v1/gap-analysis/{job_id}

Async gap-analysis. POST submits a job (counts against the gap quota); GET polls for status and result. Jobs are retained for 7 days.

POST request body

{
  "my_domain": "example.com",
  "competitor_domains": ["a.com", "b.com", "c.com"]   // 1-5 entries
}

POST response (202)

{
  "job_id": "gap_a1b2c3",
  "status": "queued",
  "poll_url": "/api/v1/gap-analysis/gap_a1b2c3"
}

GET response - running

{
  "job_id": "gap_a1b2c3",
  "status": "running",
  "started_at": "2026-04-27T12:34:56Z",
  "progress_pct": 42
}

GET response - completed

{
  "job_id": "gap_a1b2c3",
  "status": "completed",
  "completed_at": "2026-04-27T12:36:18Z",
  "result": {
    "my_domain": "example.com",
    "competitor_domains": ["a.com", "b.com", "c.com"],
    "gaps": [
      { "linking_domain": "x.com", "found_on": ["a.com", "b.com"] }
    ],
    "total_gaps": 1284
  }
}

GET response - failed

{
  "job_id": "gap_a1b2c3",
  "status": "failed",
  "error": { "code": "internal_error", "message": "..." }
}

Notes

Max 5 competitors per request - matches the dashboard cap.
GET returns 404 not_found if the job isn't yours, even if the id exists.
Failed jobs don't refund quota in v1. Email support with the request_id if it matters.

curl

# 1. Submit
curl -X POST https://crawlgraph.com/api/v1/gap-analysis \
  -H "Authorization: Bearer cg_live_…" \
  -H "Content-Type: application/json" \
  -d '{"my_domain": "example.com", "competitor_domains": ["a.com","b.com"]}'

# 2. Poll
curl https://crawlgraph.com/api/v1/gap-analysis/gap_a1b2c3 \
  -H "Authorization: Bearer cg_live_…"

Python (requests)

import os, time, requests

H = {"Authorization": f"Bearer {os.environ['CG_KEY']}"}
sub = requests.post(
    "https://crawlgraph.com/api/v1/gap-analysis",
    headers=H,
    json={
        "my_domain": "example.com",
        "competitor_domains": ["a.com", "b.com"],
    },
    timeout=30,
).json()

job_id = sub["job_id"]
while True:
    j = requests.get(
        f"https://crawlgraph.com/api/v1/gap-analysis/{job_id}",
        headers=H, timeout=30,
    ).json()
    if j["status"] in ("completed", "failed"):
        break
    time.sleep(5)
print(j)

Node (fetch)

const H = { Authorization: `Bearer ${process.env.CG_KEY}` };

const submit = await fetch("https://crawlgraph.com/api/v1/gap-analysis", {
  method: "POST",
  headers: { ...H, "Content-Type": "application/json" },
  body: JSON.stringify({
    my_domain: "example.com",
    competitor_domains: ["a.com", "b.com"],
  }),
}).then(r => r.json());

const jobId = submit.job_id;
let job;
do {
  await new Promise(r => setTimeout(r, 5000));
  job = await fetch(
    `https://crawlgraph.com/api/v1/gap-analysis/${jobId}`,
    { headers: H },
  ).then(r => r.json());
} while (job.status !== "completed" && job.status !== "failed");
console.log(job);

GET /api/v1/changes

GET/api/v1/changes

Quarter-over-quarter diff for a domain. Returns referring domains added or lost between two CC releases. Counts as one call against the backlinks quota - same bucket as POST /api/v1/backlinks.

Today this endpoint is a stub. We have ingested only one Common Crawl quarter so there is nothing to diff against. The shape below is final; the response body just carries comparison_available: false until the next release lands (estimated July 2026).

Query parameters

domain - optional; the target host (same shape as /backlinks).
from - optional; older release id. Defaults to the previous quarter (which doesn't exist yet → stub).
to - optional; newer release id. Defaults to the latest available release.

Response - stub (today)

{
  "comparison_available": false,
  "current_release": "cc-main-2026-jan-feb-mar",
  "next_available_after": "cc-main-2026-apr-may-jun",
  "next_available_estimate": "2026-07-15",
  "message": "first delta will be available after the next quarterly Common Crawl release ingests"
}

curl

curl "https://crawlgraph.com/api/v1/changes?domain=example.com" \
  -H "Authorization: Bearer cg_live_…"

7. Webhooks & changelog

Webhooks are coming in a future version. For now, polling the gap-analysis job endpoint is the only async pattern.

The current API is at v1. Breaking changes will land on /api/v2 with a deprecation window - your v1 integrations won't break overnight. Check back here for changelogs.

8. OpenAPI

For tooling integration (Postman, openapi-typescript, Insomnia, etc.) the OpenAPI 3.1 schema is served at /api/v1/openapi.json. It covers exactly the endpoints documented above and is regenerated on every deploy.