SimpleCrawl
use caseSEOweb crawlingsite audit

SEO Crawler API — Audit Any Website at Scale

Use SimpleCrawl's API to build SEO crawlers that audit websites at scale. Extract title tags, meta descriptions, headings, links, and technical SEO data from any page.

SimpleCrawl Team7 min read

An SEO crawler API lets you programmatically audit any website's on-page SEO — title tags, meta descriptions, heading hierarchy, internal links, canonical tags, and more. SimpleCrawl extracts this data in a single API call, making it straightforward to build custom SEO audit tools without managing headless browsers or parsing raw HTML.

What You Can Extract for SEO Audits

SimpleCrawl's structured extraction pulls the exact data SEO professionals need:

curl -X POST https://api.simplecrawl.com/scrape \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/seo-guide",
    "output": "json",
    "schema": {
      "title_tag": "string",
      "meta_description": "string",
      "canonical_url": "string",
      "h1": "string",
      "h2s": ["string"],
      "word_count": "number",
      "internal_links": ["string"],
      "external_links": ["string"],
      "images_without_alt": "number",
      "schema_markup_types": ["string"]
    }
  }'

Response:

{
  "data": {
    "title_tag": "Complete SEO Guide for 2026 | Example Blog",
    "meta_description": "Learn everything about SEO in 2026...",
    "canonical_url": "https://example.com/blog/seo-guide",
    "h1": "Complete SEO Guide for 2026",
    "h2s": ["On-Page SEO", "Technical SEO", "Link Building", "Content Strategy"],
    "word_count": 3847,
    "internal_links": ["/blog/keyword-research", "/blog/technical-seo", "/tools/site-audit"],
    "external_links": ["https://developers.google.com/search", "https://moz.com/learn"],
    "images_without_alt": 2,
    "schema_markup_types": ["Article", "BreadcrumbList"]
  }
}

Building a Site-Wide SEO Audit

Step 1: Discover All Pages

Start with the sitemap to get all crawlable URLs:

import simplecrawl
import xml.etree.ElementTree as ET
import requests

client = simplecrawl.Client(api_key="YOUR_KEY")

sitemap_resp = requests.get("https://example.com/sitemap.xml")
root = ET.fromstring(sitemap_resp.content)
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
urls = [loc.text for loc in root.findall(".//sm:loc", ns)]

print(f"Found {len(urls)} URLs to audit")

You can also use our free Sitemap Analyzer tool to inspect sitemaps interactively.

Step 2: Extract SEO Data from Every Page

seo_schema = {
    "title_tag": "string",
    "meta_description": "string",
    "canonical_url": "string",
    "h1": "string",
    "h2s": ["string"],
    "word_count": "number",
    "internal_links": ["string"],
    "external_links": ["string"],
    "images_without_alt": "number",
}

results = client.batch(urls=urls, output="json", schema=seo_schema)

audit_data = []
for result in results:
    audit_data.append({
        "url": result.url,
        **result.data,
        "status": "success"
    })

Step 3: Identify Issues

issues = {
    "missing_title": [],
    "long_title": [],
    "missing_meta": [],
    "long_meta": [],
    "missing_h1": [],
    "multiple_h1": [],
    "thin_content": [],
    "no_internal_links": [],
    "images_need_alt": [],
    "canonical_mismatch": [],
}

for page in audit_data:
    if not page.get("title_tag"):
        issues["missing_title"].append(page["url"])
    elif len(page["title_tag"]) > 60:
        issues["long_title"].append(page["url"])

    if not page.get("meta_description"):
        issues["missing_meta"].append(page["url"])
    elif len(page["meta_description"]) > 160:
        issues["long_meta"].append(page["url"])

    if not page.get("h1"):
        issues["missing_h1"].append(page["url"])

    if page.get("word_count", 0) < 300:
        issues["thin_content"].append(page["url"])

    if not page.get("internal_links"):
        issues["no_internal_links"].append(page["url"])

    if page.get("images_without_alt", 0) > 0:
        issues["images_need_alt"].append(page["url"])

    if page.get("canonical_url") and page.get("canonical_url") != page["url"]:
        issues["canonical_mismatch"].append(page["url"])

Step 4: Generate the Report

def generate_report(audit_data, issues):
    report = "# SEO Audit Report\n\n"
    report += f"**Pages audited:** {len(audit_data)}\n\n"
    report += "## Issues Found\n\n"
    report += "| Issue | Count | Severity |\n|---|---|---|\n"

    severity_map = {
        "missing_title": "Critical",
        "missing_h1": "Critical",
        "missing_meta": "High",
        "thin_content": "High",
        "canonical_mismatch": "High",
        "long_title": "Medium",
        "long_meta": "Medium",
        "no_internal_links": "Medium",
        "images_need_alt": "Low",
    }

    for issue_key, urls in issues.items():
        if urls:
            label = issue_key.replace("_", " ").title()
            severity = severity_map.get(issue_key, "Low")
            report += f"| {label} | {len(urls)} | {severity} |\n"

    return report

print(generate_report(audit_data, issues))

SEO Checks You Can Automate

On-Page Checks

CheckWhat SimpleCrawl extractsWhy it matters
Title tag lengthtitle_tag → character countTitles over 60 chars get truncated in SERPs
Meta descriptionmeta_descriptionMissing descriptions = Google generates its own
H1 tagh1Missing or duplicate H1s hurt page focus
Heading hierarchyh2s, h3sProper hierarchy helps crawlers understand content structure
Word countword_countThin content (under 300 words) rarely ranks
Internal linksinternal_linksOrphan pages with no internal links are hard to discover
Image alt textimages_without_altMissing alt text hurts accessibility and image search
Canonical tagscanonical_urlMismatched canonicals cause indexing confusion

Content Quality Checks

Use SimpleCrawl's markdown output combined with NLP for deeper analysis:

result = client.scrape(url, output="markdown")
markdown = result.markdown

checks = {
    "has_primary_keyword": primary_keyword.lower() in markdown[:500].lower(),
    "has_internal_links": "[" in markdown and "](/" in markdown,
    "has_external_links": "](http" in markdown,
    "has_images": "![" in markdown,
    "has_code_examples": "```" in markdown,
    "readability_score": calculate_flesch_kincaid(markdown),
}

Technical SEO Checks

Combine SimpleCrawl with other tools for a complete technical audit:

# Check robots.txt compliance
robots_result = client.scrape(f"{domain}/robots.txt", output="markdown")

# Check meta robots tags
page_result = client.scrape(url, output="json", schema={
    "meta_robots": "string",
    "x_robots_tag": "string",
    "hreflang_tags": ["string"],
    "structured_data_types": ["string"],
})

Use our Robots.txt Checker and Meta Tag Extractor for individual page analysis.

Monitoring SEO Over Time

Run audits on a schedule and track changes:

def track_seo_changes(current_audit, previous_audit):
    changes = []
    for curr in current_audit:
        prev = next(
            (p for p in previous_audit if p["url"] == curr["url"]),
            None
        )
        if not prev:
            changes.append({"url": curr["url"], "type": "new_page"})
            continue

        if curr.get("title_tag") != prev.get("title_tag"):
            changes.append({
                "url": curr["url"],
                "type": "title_changed",
                "old": prev["title_tag"],
                "new": curr["title_tag"],
            })

        if curr.get("word_count", 0) < prev.get("word_count", 0) * 0.5:
            changes.append({
                "url": curr["url"],
                "type": "content_removed",
                "old_count": prev["word_count"],
                "new_count": curr["word_count"],
            })

    removed = [
        p["url"] for p in previous_audit
        if not any(c["url"] == p["url"] for c in current_audit)
    ]
    for url in removed:
        changes.append({"url": url, "type": "page_removed"})

    return changes

Cost for SEO Auditing

Site sizeAudit frequencyCredits/monthSimpleCrawl plan
100 pagesWeekly400Starter ($29)
1,000 pagesWeekly4,000Starter ($29)
5,000 pagesWeekly20,000Growth ($79)
10,000 pagesDaily300,000Enterprise
50,000 pagesWeekly200,000Enterprise

Most sites under 5,000 pages fit comfortably in the Starter or Growth plan with weekly audits.

FAQ

What is an SEO crawler API?

An SEO crawler API is a web scraping service optimized for extracting SEO-relevant data from web pages — title tags, meta descriptions, headings, links, schema markup, and content metrics. Unlike general-purpose scraping, SEO crawling focuses on the metadata and structural elements that affect search rankings.

How is this different from Screaming Frog or Ahrefs?

Screaming Frog and Ahrefs are complete SEO tools with crawling built in. SimpleCrawl is the extraction layer — you get raw SEO data and build custom analysis on top. Use SimpleCrawl when you need programmatic access to SEO data for custom dashboards, alerts, or integration with your existing tools.

Can I audit competitor websites?

Yes. SimpleCrawl works on any public website. Audit competitor title tags, content structure, internal linking, and keyword targeting to inform your own SEO strategy. Combine with content aggregation to track competitor content publishing.

How many pages can I audit at once?

SimpleCrawl's batch API handles thousands of URLs in a single request. For very large sites (50,000+ pages), use the webhook delivery option to receive results asynchronously.

Does SimpleCrawl handle JavaScript-rendered SEO elements?

Yes. SimpleCrawl renders JavaScript before extraction, catching dynamically-inserted title tags, schema markup, and content that server-side-only crawlers miss. This is critical for auditing SPAs and sites using client-side rendering.

Can I check for Core Web Vitals with SimpleCrawl?

SimpleCrawl extracts on-page SEO elements, not performance metrics. For Core Web Vitals (LCP, FID, CLS), use Google's PageSpeed Insights API or Chrome UX Report alongside SimpleCrawl's on-page data.

Get Started

Build your custom SEO audit tool with SimpleCrawl. Join the waitlist for 500 free credits — enough to audit a 500-page site. For deeper SEO analysis of individual pages, try our free Meta Tag Extractor.

Ready to try SimpleCrawl?

We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.

Get early access + 500 free credits