use caselead generationsalesbusiness data

Lead Generation API — Extract Business Data at Scale

Use SimpleCrawl's API to extract business contact information, company data, and lead details from public websites at scale. Build targeted prospect lists programmatically.

SimpleCrawl TeamFebruary 20, 20267 min read

A lead generation API extracts business contact information, company details, and prospect data from publicly available websites. Instead of manually browsing directories, company pages, and LinkedIn profiles, you use SimpleCrawl's structured extraction to pull names, emails, phone numbers, company sizes, and more — then feed that data directly into your CRM.

What Lead Data You Can Extract

SimpleCrawl's schema-based extraction handles the most common lead generation targets:

Company Websites

import simplecrawl

client = simplecrawl.Client(api_key="YOUR_KEY")

result = client.scrape("https://example-company.com/about", output="json", schema={
    "company_name": "string",
    "description": "string",
    "industry": "string",
    "founded_year": "number",
    "team_size": "string",
    "headquarters": "string",
    "email": "string",
    "phone": "string",
    "social_links": {
        "linkedin": "string",
        "twitter": "string"
    }
})

print(result.data)

Business Directories

result = client.scrape("https://directory.example.com/plumbers/new-york", output="json", schema={
    "businesses": [{
        "name": "string",
        "address": "string",
        "phone": "string",
        "website": "string",
        "rating": "number",
        "review_count": "number",
        "categories": ["string"]
    }]
})

Job Listings (Company Research)

result = client.scrape("https://example-company.com/careers", output="json", schema={
    "company_name": "string",
    "open_positions": [{
        "title": "string",
        "department": "string",
        "location": "string",
    }],
    "tech_stack_mentioned": ["string"],
    "total_openings": "number"
})

Job listings reveal a company's tech stack, growth stage (how many positions are open), and organizational structure — all valuable for B2B sales targeting.

Building a Lead Generation Pipeline

Step 1: Source Discovery

Identify where your target leads exist on the web:

sources = [
    {
        "type": "directory",
        "urls": [
            "https://clutch.co/web-developers",
            "https://www.g2.com/categories/web-scraping",
            "https://www.producthunt.com/topics/developer-tools",
        ],
        "schema": {
            "companies": [{
                "name": "string",
                "website": "string",
                "description": "string",
                "rating": "number",
            }]
        }
    },
    {
        "type": "company_page",
        "schema": {
            "company_name": "string",
            "email": "string",
            "phone": "string",
            "team_size": "string",
            "industry": "string",
            "description": "string",
        }
    }
]

Step 2: Extract Directory Listings

all_companies = []

for source in sources:
    if source["type"] != "directory":
        continue
    for url in source["urls"]:
        result = client.scrape(url, output="json", schema=source["schema"])
        for company in result.data.get("companies", []):
            all_companies.append({
                "name": company["name"],
                "website": company.get("website"),
                "source": url,
                "rating": company.get("rating"),
            })

print(f"Found {len(all_companies)} companies from directories")

Step 3: Enrich with Company Details

enriched_leads = []

company_schema = {
    "company_name": "string",
    "description": "string",
    "email": "string",
    "phone": "string",
    "team_size": "string",
    "headquarters": "string",
    "founded_year": "number",
    "services": ["string"],
}

for company in all_companies:
    if not company.get("website"):
        continue
    try:
        about_urls = [
            f"{company['website']}/about",
            f"{company['website']}/about-us",
            f"{company['website']}/contact",
        ]
        for about_url in about_urls:
            try:
                detail = client.scrape(about_url, output="json", schema=company_schema)
                enriched_leads.append({
                    **company,
                    **detail.data,
                    "enriched": True,
                })
                break
            except Exception:
                continue
    except Exception:
        enriched_leads.append({**company, "enriched": False})

Step 4: Score and Prioritize

def score_lead(lead):
    score = 0

    if lead.get("email"):
        score += 20
    if lead.get("phone"):
        score += 10

    team = lead.get("team_size", "")
    if any(s in team.lower() for s in ["50", "100", "200"]):
        score += 30
    elif any(s in team.lower() for s in ["10", "20", "30"]):
        score += 20

    if lead.get("rating") and lead["rating"] >= 4.0:
        score += 15

    if lead.get("services"):
        score += 5

    return score

for lead in enriched_leads:
    lead["score"] = score_lead(lead)

enriched_leads.sort(key=lambda x: x["score"], reverse=True)

Step 5: Export to CRM

import csv

def export_to_csv(leads, filename="leads.csv"):
    if not leads:
        return
    fields = ["company_name", "website", "email", "phone", "team_size",
              "headquarters", "description", "score", "source"]
    with open(filename, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore")
        writer.writeheader()
        writer.writerows(leads)

export_to_csv(enriched_leads)

Or push directly to your CRM via API:

import requests

def push_to_hubspot(leads, api_key):
    for lead in leads:
        requests.post(
            "https://api.hubapi.com/crm/v3/objects/companies",
            headers={"Authorization": f"Bearer {api_key}"},
            json={
                "properties": {
                    "name": lead.get("company_name", ""),
                    "domain": lead.get("website", ""),
                    "phone": lead.get("phone", ""),
                    "description": lead.get("description", ""),
                    "numberofemployees": lead.get("team_size", ""),
                }
            }
        )

Ethical Lead Generation

Web scraping for lead generation operates in a legal and ethical gray area. Follow these guidelines:

Only scrape public data. Information on public company websites, directories, and profiles is fair game. Do not scrape behind login walls.
Respect robots.txt. Check the target site's robots.txt before scraping.
Do not scrape personal data in the EU without a lawful basis. GDPR applies to personal data (names, emails) of EU residents.
Honor opt-out requests. If someone asks to be removed from your list, remove them.
Rate-limit your scraping. Don't hammer sites with thousands of requests per minute.
Provide value. The leads you contact should actually benefit from your product/service.

Lead Generation Use Cases

B2B SaaS Sales

Scrape SaaS review sites (G2, Capterra) for companies using competitor products. Extract company names, websites, and reviews, then enrich with contact data from company websites.

Local Business Outreach

Scrape business directories (Yelp, Yellow Pages, industry-specific directories) for local businesses that match your target profile. Extract addresses, phone numbers, and service categories.

Recruitment

Scrape company career pages to identify growing companies (many open positions), their tech stacks (from job descriptions), and hiring patterns. Useful for staffing agencies and recruitment tools.

Investor Research

Scrape startup directories (Product Hunt, Crunchbase profiles, AngelList) for company data, funding information, and team details. Combine with research data extraction for deeper analysis.

Cost for Lead Generation

Volume	Sources	Credits/month	Plan	Cost
100 leads/month	3 directories + enrichment	~500	Starter ($29)	$29
500 leads/month	5 directories + enrichment	~3,000	Starter ($29)	$29
2,000 leads/month	10 directories + enrichment	~12,000	Growth ($79)	$79
10,000 leads/month	Multiple sources + deep enrichment	~60,000	Scale ($199)	$199

Compare to enterprise lead generation tools like ZoomInfo ($15,000+/year) or Apollo ($5,000+/year). SimpleCrawl gives you the raw extraction capability at a fraction of the cost — though you build the enrichment and scoring logic yourself.

FAQ

Is scraping for lead generation legal?

Scraping publicly available business information (company names, public email addresses, phone numbers on contact pages) is generally legal in the US. The hiQ Labs v. LinkedIn ruling supports scraping public data. However, GDPR in Europe adds restrictions on personal data processing. Always scrape responsibly and consult legal counsel for your specific use case.

What is the difference between a lead generation API and a lead database?

A lead database (ZoomInfo, Apollo, Lusha) provides pre-collected contact data. A lead generation API (SimpleCrawl) gives you the tools to extract data from any source. Databases are faster for common targets but limited to their coverage. An API lets you scrape niche directories, industry-specific sites, and company pages that databases miss.

Can I extract email addresses with SimpleCrawl?

Yes — SimpleCrawl extracts email addresses that are publicly visible on web pages (contact pages, about pages, footer sections). It does not guess or generate email addresses. For email finding based on name + domain patterns, combine SimpleCrawl data with a dedicated email finder service.

Ready to try SimpleCrawl?

We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.