v1 · stable
CrawlGraph API
A small HTTP API for programmatic backlink lookups, release discovery, and gap-analysis jobs. JSON in, JSON out. Bearer-token auth. Designed to fit into n8n, scripts, dashboards - anywhere you'd write five lines of code instead of clicking.
Available to lifetime-tier customers. Free accounts get zero API calls - pick up a lifetime licence on the landing page if you don't have one yet.
1. Quickstart
Three lines of curl. Replace the token with your own from the account page.
curl -X POST https://crawlgraph.com/api/v1/backlinks \
-H "Authorization: Bearer cg_live_…" \
-H "Content-Type: application/json" \
-d '{"domain": "example.com"}'Response (excerpt)
{
"domain": "example.com",
"release_id": "CC-MAIN-2026-04",
"release_label": "Apr 2026",
"total_linking_domains": 4821,
"returned": 1000,
"results": [
{ "linking_domain": "blog.foo.com", "num_hosts": 12, "tld": "com",
"cg_authority": 84, "cg_rank": 1421 },
{ "linking_domain": "news.bar.org", "num_hosts": 7, "tld": "org",
"cg_authority": 71, "cg_rank": 9402 }
]
}2. Authentication
Every request to /api/v1/* needs a bearer token in the Authorization header:
Authorization: Bearer cg_live_<your-key>- Keys are prefixed with
cg_live_and roughly 52 characters long. - Get a key → from your account page. You can have up to 10 active keys per user and label each one (e.g. production, n8n-bot).
- The full key is shown only once at creation. If you lose it, revoke and create a new one - there's no recovery path.
- All
/api/v1/*endpoints require a valid, non-revoked key tied to an active lifetime user.
3. MCP server (Claude, Cursor, ...)
crawlgraph-mcp is the official Model Context Protocol server that wraps this API, so any MCP client - Claude Desktop, Claude Code, Cursor, Cline, Zed, Windsurf - can run backlink lookups and competitor gap analysis without you writing a line of HTTP. It is open source (MIT) and published on npm.
Install (Claude Desktop / Claude Code)
Add this to your MCP config (claude_desktop_config.json, or .mcp.json for Claude Code). Cursor, Cline, Zed and Windsurf take the same shape.
{
"mcpServers": {
"crawlgraph": {
"command": "npx",
"args": ["-y", "crawlgraph-mcp"],
"env": {
"CRAWLGRAPH_API_KEY": "cg_live_<your-key>"
}
}
}
}Tools
backlinks- referring domains for a target, with authority scores.gap_analysis- domains linking to your competitors but not to you.gap_outreach_targets- the warm-outreach play: the domains that link to all of your competitors but not to you, de-noised (platforms and CDNs filtered) and ranked by authority. The warmest backlink targets you will ever pitch.releases- list the Common Crawl snapshots you can query.
Then just ask your assistant
Once it is connected you describe the goal in plain language and the agent runs the tools for you:
"Use gap_outreach_targets for mydomain.com against
competitor-a.com and competitor-b.com, then draft a short
outreach email to each priority target."Same auth and quotas as the HTTP API - the MCP server is a thin client over the endpoints documented below. Source, issues and the full tool reference live at github.com/pucilpet/crawlgraph-mcp.
4. Quotas & rate limits
| Resource | Monthly quota | Counter |
|---|---|---|
| backlinks calls | 1,000 / mo | per user |
| gap-analysis jobs | 50 / mo | per user |
| releases lookups | unlimited | not counted |
- Window is the calendar month in UTC. Hard reset on the 1st at 00:00 UTC - no rollover.
- Only successful (2xx) calls count. Validation errors, auth failures, and quota rejections are free.
- Failed gap jobs do not refund quota in v1. If something looks wrong, email support and quote the
request_id. - A separate IP-based limiter caps bursts at roughly 60 requests per minute on
/api/v1/*to protect the backend.
Response headers
Every 2xx response (and 429s) carries these headers so your client can pace itself:
| Header | Meaning |
|---|---|
| X-RateLimit-Limit-Backlinks | Monthly cap (1000). |
| X-RateLimit-Remaining-Backlinks | Calls left this month. |
| X-RateLimit-Limit-Gap | Monthly gap-job cap (50). |
| X-RateLimit-Remaining-Gap | Gap jobs left this month. |
| X-RateLimit-Reset | Unix timestamp of the next month rollover. |
| X-Request-ID | Echo this on support tickets. |
| Retry-After | Seconds until quota resets. Sent only on 429. |
5. Errors
Every non-2xx response uses the same envelope:
{
"error": "<code>",
"message": "<human readable>",
"request_id": "req_a1b2c3d4"
}| Code | Status | Meaning |
|---|---|---|
| auth_missing | 401 | Authorization header missing or malformed. |
| auth_invalid | 401 | Key unknown, revoked, or owner refunded. |
| quota_exceeded | 429 | Monthly quota hit; check Retry-After. |
| validation_error | 400 | Request body or query failed validation. |
| not_found | 404 | Resource doesn't exist or isn't yours. |
| internal_error | 500 | Server bug - quote the request_id. |
6. Endpoints
POST /api/v1/backlinks
/api/v1/backlinksSynchronous backlink lookup for a single domain. Counts against the backlinks quota.
Request body
{
"domain": "example.com",
"release_id": "CC-MAIN-2026-04", // optional; default = latest
"limit": 1000, // optional; default 1000, max 10000
"sort": "authority" // optional; "authority" (default) | "hosts"
}Response
{
"domain": "example.com",
"release_id": "CC-MAIN-2026-04",
"release_label": "Apr 2026",
"total_linking_domains": 4821,
"returned": 1000,
"results": [
{ "linking_domain": "blog.foo.com", "num_hosts": 12, "tld": "com",
"cg_authority": 84, "cg_rank": 1421 },
{ "linking_domain": "news.bar.org", "num_hosts": 7, "tld": "org",
"cg_authority": 71, "cg_rank": 9402 }
]
}Notes
- limit caps at 10,000. The API is for programmatic use, not bulk export - use the dashboard for full datasets.
- Field names match the internal service: linking_domain, num_hosts, tld. No rename layer.
- cg_authority is a 0-100 log-rank percentile derived from Common Crawl's harmonic centrality (higher = more authoritative). cg_rank is the raw PageRank position across the whole graph (1 = top-ranked domain). Both are null for domains that don't appear in the ranks file.
- sort="authority" (default) orders by cg_authority DESC then num_hosts DESC; sort="hosts" preserves the legacy num_hosts DESC order.
- Malformed domain, unknown release_id, or out-of-range limit → 400 validation_error.
curl
curl -X POST https://crawlgraph.com/api/v1/backlinks \
-H "Authorization: Bearer cg_live_…" \
-H "Content-Type: application/json" \
-d '{"domain": "example.com", "limit": 500}'Python (requests)
import os, requests
r = requests.post(
"https://crawlgraph.com/api/v1/backlinks",
headers={"Authorization": f"Bearer {os.environ['CG_KEY']}"},
json={"domain": "example.com", "limit": 500},
timeout=30,
)
r.raise_for_status()
data = r.json()
print(data["total_linking_domains"], "linking domains")Node (fetch)
const res = await fetch("https://crawlgraph.com/api/v1/backlinks", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.CG_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ domain: "example.com", limit: 500 }),
});
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const data = await res.json();
console.log(data.total_linking_domains, "linking domains");GET /api/v1/releases
/api/v1/releasesList the Common Crawl releases CrawlGraph has indexed. Read-only, not counted against quota - this is the caller learning what to pass.
Response
{
"releases": [
{ "id": "CC-MAIN-2026-04", "label": "Apr 2026", "available": true },
{ "id": "CC-MAIN-2025-50", "label": "Dec 2025", "available": true }
]
}curl
curl https://crawlgraph.com/api/v1/releases \
-H "Authorization: Bearer cg_live_…"Python (requests)
import os, requests
r = requests.get(
"https://crawlgraph.com/api/v1/releases",
headers={"Authorization": f"Bearer {os.environ['CG_KEY']}"},
timeout=30,
)
r.raise_for_status()
for rel in r.json()["releases"]:
print(rel["id"], rel["label"])Node (fetch)
const res = await fetch("https://crawlgraph.com/api/v1/releases", {
headers: { Authorization: `Bearer ${process.env.CG_KEY}` },
});
const { releases } = await res.json();
for (const r of releases) console.log(r.id, r.label);POST /api/v1/gap-analysis · GET /api/v1/gap-analysis/{job_id}
/api/v1/gap-analysis/api/v1/gap-analysis/{job_id}Async gap-analysis. POST submits a job (counts against the gap quota); GET polls for status and result. Jobs are retained for 7 days.
POST request body
{
"my_domain": "example.com",
"competitor_domains": ["a.com", "b.com", "c.com"] // 1-5 entries
}POST response (202)
{
"job_id": "gap_a1b2c3",
"status": "queued",
"poll_url": "/api/v1/gap-analysis/gap_a1b2c3"
}GET response - running
{
"job_id": "gap_a1b2c3",
"status": "running",
"started_at": "2026-04-27T12:34:56Z",
"progress_pct": 42
}GET response - completed
{
"job_id": "gap_a1b2c3",
"status": "completed",
"completed_at": "2026-04-27T12:36:18Z",
"result": {
"my_domain": "example.com",
"competitor_domains": ["a.com", "b.com", "c.com"],
"gaps": [
{ "linking_domain": "x.com", "found_on": ["a.com", "b.com"] }
],
"total_gaps": 1284
}
}GET response - failed
{
"job_id": "gap_a1b2c3",
"status": "failed",
"error": { "code": "internal_error", "message": "..." }
}Notes
- Max 5 competitors per request - matches the dashboard cap.
- GET returns 404 not_found if the job isn't yours, even if the id exists.
- Failed jobs don't refund quota in v1. Email support with the request_id if it matters.
curl
# 1. Submit
curl -X POST https://crawlgraph.com/api/v1/gap-analysis \
-H "Authorization: Bearer cg_live_…" \
-H "Content-Type: application/json" \
-d '{"my_domain": "example.com", "competitor_domains": ["a.com","b.com"]}'
# 2. Poll
curl https://crawlgraph.com/api/v1/gap-analysis/gap_a1b2c3 \
-H "Authorization: Bearer cg_live_…"Python (requests)
import os, time, requests
H = {"Authorization": f"Bearer {os.environ['CG_KEY']}"}
sub = requests.post(
"https://crawlgraph.com/api/v1/gap-analysis",
headers=H,
json={
"my_domain": "example.com",
"competitor_domains": ["a.com", "b.com"],
},
timeout=30,
).json()
job_id = sub["job_id"]
while True:
j = requests.get(
f"https://crawlgraph.com/api/v1/gap-analysis/{job_id}",
headers=H, timeout=30,
).json()
if j["status"] in ("completed", "failed"):
break
time.sleep(5)
print(j)Node (fetch)
const H = { Authorization: `Bearer ${process.env.CG_KEY}` };
const submit = await fetch("https://crawlgraph.com/api/v1/gap-analysis", {
method: "POST",
headers: { ...H, "Content-Type": "application/json" },
body: JSON.stringify({
my_domain: "example.com",
competitor_domains: ["a.com", "b.com"],
}),
}).then(r => r.json());
const jobId = submit.job_id;
let job;
do {
await new Promise(r => setTimeout(r, 5000));
job = await fetch(
`https://crawlgraph.com/api/v1/gap-analysis/${jobId}`,
{ headers: H },
).then(r => r.json());
} while (job.status !== "completed" && job.status !== "failed");
console.log(job);GET /api/v1/changes
/api/v1/changesQuarter-over-quarter diff for a domain. Returns referring domains added or lost between two CC releases. Counts as one call against the backlinks quota - same bucket as POST /api/v1/backlinks.
Today this endpoint is a stub. We have ingested only one Common Crawl quarter so there is nothing to diff against. The shape below is final; the response body just carries comparison_available: false until the next release lands (estimated July 2026).
Query parameters
- domain - optional; the target host (same shape as /backlinks).
- from - optional; older release id. Defaults to the previous quarter (which doesn't exist yet → stub).
- to - optional; newer release id. Defaults to the latest available release.
Response - stub (today)
{
"comparison_available": false,
"current_release": "cc-main-2026-jan-feb-mar",
"next_available_after": "cc-main-2026-apr-may-jun",
"next_available_estimate": "2026-07-15",
"message": "first delta will be available after the next quarterly Common Crawl release ingests"
}curl
curl "https://crawlgraph.com/api/v1/changes?domain=example.com" \
-H "Authorization: Bearer cg_live_…"7. Webhooks & changelog
Webhooks are coming in a future version. For now, polling the gap-analysis job endpoint is the only async pattern.
The current API is at v1. Breaking changes will land on /api/v2 with a deprecation window - your v1 integrations won't break overnight. Check back here for changelogs.
8. OpenAPI
For tooling integration (Postman, openapi-typescript, Insomnia, etc.) the OpenAPI 3.1 schema is served at /api/v1/openapi.json. It covers exactly the endpoints documented above and is regenerated on every deploy.