SEO Crawler API — Audit Any Website at Scale
Use SimpleCrawl's API to build SEO crawlers that audit websites at scale. Extract title tags, meta descriptions, headings, links, and technical SEO data from any page.
An SEO crawler API lets you programmatically audit any website's on-page SEO — title tags, meta descriptions, heading hierarchy, internal links, canonical tags, and more. SimpleCrawl extracts this data in a single API call, making it straightforward to build custom SEO audit tools without managing headless browsers or parsing raw HTML.
What You Can Extract for SEO Audits
SimpleCrawl's structured extraction pulls the exact data SEO professionals need:
curl -X POST https://api.simplecrawl.com/scrape \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/blog/seo-guide",
"output": "json",
"schema": {
"title_tag": "string",
"meta_description": "string",
"canonical_url": "string",
"h1": "string",
"h2s": ["string"],
"word_count": "number",
"internal_links": ["string"],
"external_links": ["string"],
"images_without_alt": "number",
"schema_markup_types": ["string"]
}
}'
Response:
{
"data": {
"title_tag": "Complete SEO Guide for 2026 | Example Blog",
"meta_description": "Learn everything about SEO in 2026...",
"canonical_url": "https://example.com/blog/seo-guide",
"h1": "Complete SEO Guide for 2026",
"h2s": ["On-Page SEO", "Technical SEO", "Link Building", "Content Strategy"],
"word_count": 3847,
"internal_links": ["/blog/keyword-research", "/blog/technical-seo", "/tools/site-audit"],
"external_links": ["https://developers.google.com/search", "https://moz.com/learn"],
"images_without_alt": 2,
"schema_markup_types": ["Article", "BreadcrumbList"]
}
}
Building a Site-Wide SEO Audit
Step 1: Discover All Pages
Start with the sitemap to get all crawlable URLs:
import simplecrawl
import xml.etree.ElementTree as ET
import requests
client = simplecrawl.Client(api_key="YOUR_KEY")
sitemap_resp = requests.get("https://example.com/sitemap.xml")
root = ET.fromstring(sitemap_resp.content)
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
urls = [loc.text for loc in root.findall(".//sm:loc", ns)]
print(f"Found {len(urls)} URLs to audit")
You can also use our free Sitemap Analyzer tool to inspect sitemaps interactively.
Step 2: Extract SEO Data from Every Page
seo_schema = {
"title_tag": "string",
"meta_description": "string",
"canonical_url": "string",
"h1": "string",
"h2s": ["string"],
"word_count": "number",
"internal_links": ["string"],
"external_links": ["string"],
"images_without_alt": "number",
}
results = client.batch(urls=urls, output="json", schema=seo_schema)
audit_data = []
for result in results:
audit_data.append({
"url": result.url,
**result.data,
"status": "success"
})
Step 3: Identify Issues
issues = {
"missing_title": [],
"long_title": [],
"missing_meta": [],
"long_meta": [],
"missing_h1": [],
"multiple_h1": [],
"thin_content": [],
"no_internal_links": [],
"images_need_alt": [],
"canonical_mismatch": [],
}
for page in audit_data:
if not page.get("title_tag"):
issues["missing_title"].append(page["url"])
elif len(page["title_tag"]) > 60:
issues["long_title"].append(page["url"])
if not page.get("meta_description"):
issues["missing_meta"].append(page["url"])
elif len(page["meta_description"]) > 160:
issues["long_meta"].append(page["url"])
if not page.get("h1"):
issues["missing_h1"].append(page["url"])
if page.get("word_count", 0) < 300:
issues["thin_content"].append(page["url"])
if not page.get("internal_links"):
issues["no_internal_links"].append(page["url"])
if page.get("images_without_alt", 0) > 0:
issues["images_need_alt"].append(page["url"])
if page.get("canonical_url") and page.get("canonical_url") != page["url"]:
issues["canonical_mismatch"].append(page["url"])
Step 4: Generate the Report
def generate_report(audit_data, issues):
report = "# SEO Audit Report\n\n"
report += f"**Pages audited:** {len(audit_data)}\n\n"
report += "## Issues Found\n\n"
report += "| Issue | Count | Severity |\n|---|---|---|\n"
severity_map = {
"missing_title": "Critical",
"missing_h1": "Critical",
"missing_meta": "High",
"thin_content": "High",
"canonical_mismatch": "High",
"long_title": "Medium",
"long_meta": "Medium",
"no_internal_links": "Medium",
"images_need_alt": "Low",
}
for issue_key, urls in issues.items():
if urls:
label = issue_key.replace("_", " ").title()
severity = severity_map.get(issue_key, "Low")
report += f"| {label} | {len(urls)} | {severity} |\n"
return report
print(generate_report(audit_data, issues))
SEO Checks You Can Automate
On-Page Checks
| Check | What SimpleCrawl extracts | Why it matters |
|---|---|---|
| Title tag length | title_tag → character count | Titles over 60 chars get truncated in SERPs |
| Meta description | meta_description | Missing descriptions = Google generates its own |
| H1 tag | h1 | Missing or duplicate H1s hurt page focus |
| Heading hierarchy | h2s, h3s | Proper hierarchy helps crawlers understand content structure |
| Word count | word_count | Thin content (under 300 words) rarely ranks |
| Internal links | internal_links | Orphan pages with no internal links are hard to discover |
| Image alt text | images_without_alt | Missing alt text hurts accessibility and image search |
| Canonical tags | canonical_url | Mismatched canonicals cause indexing confusion |
Content Quality Checks
Use SimpleCrawl's markdown output combined with NLP for deeper analysis:
result = client.scrape(url, output="markdown")
markdown = result.markdown
checks = {
"has_primary_keyword": primary_keyword.lower() in markdown[:500].lower(),
"has_internal_links": "[" in markdown and "](/" in markdown,
"has_external_links": "](http" in markdown,
"has_images": "![" in markdown,
"has_code_examples": "```" in markdown,
"readability_score": calculate_flesch_kincaid(markdown),
}
Technical SEO Checks
Combine SimpleCrawl with other tools for a complete technical audit:
# Check robots.txt compliance
robots_result = client.scrape(f"{domain}/robots.txt", output="markdown")
# Check meta robots tags
page_result = client.scrape(url, output="json", schema={
"meta_robots": "string",
"x_robots_tag": "string",
"hreflang_tags": ["string"],
"structured_data_types": ["string"],
})
Use our Robots.txt Checker and Meta Tag Extractor for individual page analysis.
Monitoring SEO Over Time
Run audits on a schedule and track changes:
def track_seo_changes(current_audit, previous_audit):
changes = []
for curr in current_audit:
prev = next(
(p for p in previous_audit if p["url"] == curr["url"]),
None
)
if not prev:
changes.append({"url": curr["url"], "type": "new_page"})
continue
if curr.get("title_tag") != prev.get("title_tag"):
changes.append({
"url": curr["url"],
"type": "title_changed",
"old": prev["title_tag"],
"new": curr["title_tag"],
})
if curr.get("word_count", 0) < prev.get("word_count", 0) * 0.5:
changes.append({
"url": curr["url"],
"type": "content_removed",
"old_count": prev["word_count"],
"new_count": curr["word_count"],
})
removed = [
p["url"] for p in previous_audit
if not any(c["url"] == p["url"] for c in current_audit)
]
for url in removed:
changes.append({"url": url, "type": "page_removed"})
return changes
Cost for SEO Auditing
| Site size | Audit frequency | Credits/month | SimpleCrawl plan |
|---|---|---|---|
| 100 pages | Weekly | 400 | Starter ($29) |
| 1,000 pages | Weekly | 4,000 | Starter ($29) |
| 5,000 pages | Weekly | 20,000 | Growth ($79) |
| 10,000 pages | Daily | 300,000 | Enterprise |
| 50,000 pages | Weekly | 200,000 | Enterprise |
Most sites under 5,000 pages fit comfortably in the Starter or Growth plan with weekly audits.
FAQ
What is an SEO crawler API?
An SEO crawler API is a web scraping service optimized for extracting SEO-relevant data from web pages — title tags, meta descriptions, headings, links, schema markup, and content metrics. Unlike general-purpose scraping, SEO crawling focuses on the metadata and structural elements that affect search rankings.
How is this different from Screaming Frog or Ahrefs?
Screaming Frog and Ahrefs are complete SEO tools with crawling built in. SimpleCrawl is the extraction layer — you get raw SEO data and build custom analysis on top. Use SimpleCrawl when you need programmatic access to SEO data for custom dashboards, alerts, or integration with your existing tools.
Can I audit competitor websites?
Yes. SimpleCrawl works on any public website. Audit competitor title tags, content structure, internal linking, and keyword targeting to inform your own SEO strategy. Combine with content aggregation to track competitor content publishing.
How many pages can I audit at once?
SimpleCrawl's batch API handles thousands of URLs in a single request. For very large sites (50,000+ pages), use the webhook delivery option to receive results asynchronously.
Does SimpleCrawl handle JavaScript-rendered SEO elements?
Yes. SimpleCrawl renders JavaScript before extraction, catching dynamically-inserted title tags, schema markup, and content that server-side-only crawlers miss. This is critical for auditing SPAs and sites using client-side rendering.
Can I check for Core Web Vitals with SimpleCrawl?
SimpleCrawl extracts on-page SEO elements, not performance metrics. For Core Web Vitals (LCP, FID, CLS), use Google's PageSpeed Insights API or Chrome UX Report alongside SimpleCrawl's on-page data.
Get Started
Build your custom SEO audit tool with SimpleCrawl. Join the waitlist for 500 free credits — enough to audit a 500-page site. For deeper SEO analysis of individual pages, try our free Meta Tag Extractor.
Ready to try SimpleCrawl?
We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.