SimpleCrawl

How to Scrape Indeed — Complete Guide (2026)

Learn how to scrape Indeed job listings, salaries, and company reviews. Compare Python scrapers with the SimpleCrawl API for reliable Indeed data extraction.

6 min read

Indeed is the world's largest job search engine, aggregating millions of job listings across every industry and location. Scraping Indeed data powers recruiting tools, salary benchmarking platforms, labor market research, and job aggregation services. This guide covers practical methods for extracting Indeed job listings, salary data, and company reviews at scale.

What Data Can You Extract from Indeed?

Indeed pages contain structured employment data across several content types:

  • Job listings — title, company, location, salary (when displayed), job description, qualifications, benefits, posting date, job type (full-time, part-time, contract)
  • Search results — job cards with title, company, location, salary snippet, urgently hiring badge, easy apply status
  • Salary data — average salaries by role and location, salary ranges, hourly vs. annual, reported vs. estimated
  • Company reviews — overall rating, individual review text, pros/cons, work-life balance, management, culture scores
  • Company profiles — overview, employee count, headquarters, industry, photos, interview insights
  • Trends data — job posting volume by keyword, location demand, hiring urgency indicators

This data feeds recruiting platforms, lead generation for HR tech, compensation analysis tools, and labor market intelligence dashboards.

Challenges When Scraping Indeed

Indeed has invested significantly in anti-scraping technology:

Cloudflare Protection

Indeed uses Cloudflare's enterprise anti-bot suite, including JavaScript challenges, browser fingerprinting, and turnstile CAPTCHAs. Basic HTTP requests are blocked before they even reach Indeed's servers.

Dynamic Content Loading

Job descriptions load dynamically when clicking a listing. Search results use infinite scroll and lazy loading. A static HTTP request captures only a fraction of the available data.

Rate Limiting and IP Bans

Indeed monitors request velocity and patterns. Exceeding their thresholds results in temporary IP blocks (returning 403 or redirects to a CAPTCHA page). They maintain IP reputation databases shared with Cloudflare.

Geolocation Filtering

Indeed serves different results based on your IP's geographic location. Scraping jobs in New York from a European IP produces inaccurate results. Geo-targeted proxies are essential.

Frequent HTML Changes

Indeed's frontend is under active development. Class names, data attributes, and DOM structure change regularly, breaking CSS selector-based scrapers.

Method 1: Using SimpleCrawl API (Easiest)

SimpleCrawl bypasses Cloudflare, renders JavaScript, and extracts structured job data:

curl -X POST https://api.simplecrawl.com/v1/scrape \
  -H "Authorization: Bearer sc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.indeed.com/jobs?q=software+engineer&l=San+Francisco",
    "format": "extract",
    "schema": {
      "jobs": [{
        "title": "string",
        "company": "string",
        "location": "string",
        "salary": "string",
        "snippet": "string",
        "posted": "string",
        "job_type": "string"
      }]
    }
  }'

For a full job description page:

{
  "url": "https://www.indeed.com/viewjob?jk=abc123def456",
  "format": "extract",
  "schema": {
    "title": "string",
    "company": "string",
    "location": "string",
    "salary_range": "string",
    "description": "string",
    "qualifications": ["string"],
    "benefits": ["string"],
    "job_type": "string"
  }
}

Method 2: DIY with Python (Manual)

Basic Approach with Requests and BeautifulSoup

import requests
from bs4 import BeautifulSoup
from urllib.parse import quote_plus

def scrape_indeed_jobs(query: str, location: str, pages: int = 1) -> list:
    jobs = []
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                      "AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml",
        "Accept-Language": "en-US,en;q=0.9",
    }

    for page in range(pages):
        start = page * 10
        url = (
            f"https://www.indeed.com/jobs"
            f"?q={quote_plus(query)}&l={quote_plus(location)}&start={start}"
        )
        response = requests.get(url, headers=headers)

        if response.status_code != 200:
            print(f"Blocked on page {page}: HTTP {response.status_code}")
            break

        soup = BeautifulSoup(response.text, "html.parser")
        cards = soup.select("div.job_seen_beacon")

        for card in cards:
            title_el = card.select_one("h2.jobTitle a span")
            company_el = card.select_one("[data-testid='company-name']")
            location_el = card.select_one("[data-testid='text-location']")
            salary_el = card.select_one("div.salary-snippet-container")

            jobs.append({
                "title": title_el.text.strip() if title_el else None,
                "company": company_el.text.strip() if company_el else None,
                "location": location_el.text.strip() if location_el else None,
                "salary": salary_el.text.strip() if salary_el else None,
            })

    return jobs

results = scrape_indeed_jobs("data engineer", "New York", pages=3)
for job in results:
    print(f"{job['title']} at {job['company']} — {job['location']}")

Using Playwright for Full Job Descriptions

from playwright.sync_api import sync_playwright
import time

def scrape_indeed_job_detail(job_url: str) -> dict:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(job_url, wait_until="networkidle")
        time.sleep(2)

        title = page.text_content("h1.jobsearch-JobInfoHeader-title")
        company = page.text_content("[data-company-name='true']")
        description = page.text_content("#jobDescriptionText")
        salary = page.text_content("#salaryInfoAndJobType")

        browser.close()
        return {
            "title": title.strip() if title else None,
            "company": company.strip() if company else None,
            "description": description.strip() if description else None,
            "salary": salary.strip() if salary else None,
        }

For detailed Python scraping patterns, see our web scraping with Python guide. TypeScript users should check the TypeScript scraping guide.

Why SimpleCrawl Is Better for Indeed

FeatureDIY PythonSimpleCrawl
Cloudflare bypassVery difficultBuilt-in
Geo-targetingProxy configAPI parameter
JS renderingPlaywright/SeleniumAutomatic
Structured outputCustom parsingSchema-based
MaintenanceHigh (layout changes)Zero
ScaleLimited by proxiesEnterprise-ready

Indeed's Cloudflare protection makes DIY scraping particularly frustrating. SimpleCrawl eliminates this friction entirely. See how we compare on our comparison page.

  • Indeed's ToS prohibit scraping — like most job sites, Indeed explicitly prohibits automated data collection in their terms.
  • Public job data — job listings are publicly posted for the purpose of broad distribution. Courts have generally been lenient about scraping public job data.
  • Salary data — Indeed's salary data is aggregated from user-reported and estimated figures. Republishing this data may raise copyright and ToS concerns.
  • GDPR and hiring data — if scraping involves personal data (recruiter names, company contacts), GDPR obligations apply for EU data.
  • robots.txt compliance — check Indeed's crawling permissions with our robots.txt checker.

FAQ

How many Indeed job listings can I scrape?

DIY scrapers typically hit blocks after 100–200 pages due to Cloudflare. SimpleCrawl supports thousands of job pages daily. See pricing for credit costs.

Can I scrape Indeed salary data?

Yes. Indeed's salary pages (/career/salaries) contain average and range data by role and location. SimpleCrawl extracts this data in structured JSON format.

How do I scrape Indeed for a specific location?

Pass the location parameter in the Indeed URL (&l=San+Francisco) or use SimpleCrawl with geo-targeted proxies for accurate location-based results.

Is there an Indeed API?

Indeed had a public job search API but discontinued it. The Indeed Publisher Program exists for approved job aggregators but has strict requirements. For most use cases, web scraping is the practical alternative.

How often should I scrape Indeed job listings?

Job listings change frequently — new postings appear hourly and listings expire within 30 days. For real-time monitoring, scrape daily. For market research, weekly captures are typically sufficient.

Ready to try SimpleCrawl?

We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.

More scraping guides

Get early access + 500 free credits