SimpleCrawl

How to Scrape LinkedIn — Complete Guide (2026)

Learn how to scrape LinkedIn profiles, job listings, and company data. Covers DIY Python methods and the SimpleCrawl API for reliable LinkedIn data extraction.

6 min read

LinkedIn is the world's largest professional network with over 1 billion members, and scraping LinkedIn data enables powerful applications — from sales prospecting and recruiting to market research and competitive analysis. This guide covers practical methods for extracting LinkedIn data, including how to scrape LinkedIn profiles, job listings, and company pages at scale.

What Data Can You Extract from LinkedIn?

LinkedIn pages contain rich professional data across several content types:

  • Profile data — name, headline, current company, location, experience history, education, skills, endorsements, certifications
  • Job listings — title, company, location, salary range, description, required skills, application count, posting date
  • Company pages — overview, employee count, industry, headquarters, recent posts, funding data
  • Search results — people matching keywords, companies by industry, job postings by role
  • Posts and articles — content text, engagement metrics (likes, comments, shares), author info
  • Sales Navigator data — lead recommendations, account insights, InMail response rates

This data powers lead generation tools, recruiting pipelines, competitor analysis dashboards, and talent market research.

Challenges When Scraping LinkedIn

LinkedIn is one of the most aggressively protected sites against scraping:

Authentication Requirements

Most LinkedIn data requires a logged-in session. Public profile pages show limited information, while search results and full profiles need authentication — making scraping significantly more complex.

Aggressive Rate Limiting

LinkedIn monitors request frequency per account and IP. Exceeding their (undocumented) thresholds triggers temporary account restrictions or permanent bans. They track scrolling patterns, page view timing, and API call frequency.

LinkedIn has actively litigated against scrapers (see LinkedIn v. hiQ Labs). While the Ninth Circuit ruled that scraping public profiles doesn't violate the CFAA, LinkedIn continues to use technical measures and legal threats against scrapers.

JavaScript-Heavy SPA

LinkedIn is a React-based single-page application. The DOM is heavily reliant on JavaScript rendering, and raw HTML fetches return almost no useful content. You need full browser rendering to extract meaningful data.

Anti-Bot Detection

LinkedIn uses sophisticated bot detection including: browser fingerprinting, behavioral analysis (scroll speed, mouse patterns), honeypot links, and anomaly detection on session behavior.

Method 1: Using SimpleCrawl API (Easiest)

SimpleCrawl handles authentication proxies, headless rendering, and anti-bot measures for LinkedIn data extraction:

curl -X POST https://api.simplecrawl.com/v1/scrape \
  -H "Authorization: Bearer sc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.linkedin.com/in/satyanadella/",
    "format": "markdown",
    "render_js": true
  }'

For structured profile data:

{
  "url": "https://www.linkedin.com/in/satyanadella/",
  "format": "extract",
  "schema": {
    "name": "string",
    "headline": "string",
    "location": "string",
    "current_company": "string",
    "connections": "string",
    "experience": [{
      "title": "string",
      "company": "string",
      "duration": "string"
    }]
  }
}

For job listings, simply point at a LinkedIn Jobs URL:

curl -X POST https://api.simplecrawl.com/v1/scrape \
  -H "Authorization: Bearer sc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.linkedin.com/jobs/view/3912345678/",
    "format": "extract",
    "schema": {
      "title": "string",
      "company": "string",
      "location": "string",
      "salary_range": "string",
      "description": "string",
      "required_skills": ["string"]
    }
  }'

Method 2: DIY with Python (Manual)

Scraping LinkedIn with Python requires careful session management and browser automation.

Using Playwright for Public Profiles

from playwright.sync_api import sync_playwright

def scrape_linkedin_profile(profile_url: str) -> dict:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1280, "height": 720},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                       "AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36",
        )
        page = context.new_page()
        page.goto(profile_url, wait_until="networkidle")

        name = page.text_content("h1.text-heading-xlarge")
        headline = page.text_content("div.text-body-medium")

        experience = []
        exp_items = page.query_selector_all("section#experience li")
        for item in exp_items:
            title_el = item.query_selector("span[aria-hidden='true']")
            if title_el:
                experience.append(title_el.text_content().strip())

        browser.close()
        return {
            "name": name.strip() if name else None,
            "headline": headline.strip() if headline else None,
            "experience": experience,
        }

profile = scrape_linkedin_profile("https://www.linkedin.com/in/satyanadella/")
print(profile)

Scraping Job Listings

from playwright.sync_api import sync_playwright
import time

def scrape_linkedin_jobs(keyword: str, location: str) -> list:
    url = (
        f"https://www.linkedin.com/jobs/search/"
        f"?keywords={keyword}&location={location}"
    )
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        time.sleep(2)

        jobs = []
        cards = page.query_selector_all("div.job-search-card")
        for card in cards[:10]:
            title = card.query_selector("h3")
            company = card.query_selector("h4")
            location_el = card.query_selector("span.job-search-card__location")
            jobs.append({
                "title": title.text_content().strip() if title else None,
                "company": company.text_content().strip() if company else None,
                "location": location_el.text_content().strip() if location_el else None,
            })

        browser.close()
        return jobs

results = scrape_linkedin_jobs("software engineer", "San Francisco")

This approach is fragile — LinkedIn changes class names frequently and will block detected automation. For production scrapers, read our web scraping with JavaScript guide or Node.js guide for alternative approaches.

Why SimpleCrawl Is Better for LinkedIn

FeatureDIY PythonSimpleCrawl
Auth handlingManual cookie/sessionManaged
Anti-bot bypassVery difficultBuilt-in
JS renderingPlaywright requiredAutomatic
Selector maintenanceBreaks weeklyAI-adaptive
Account riskHigh (bans)Managed proxy pool
ScaleLimitedEnterprise-ready

LinkedIn's aggressive anti-scraping measures make DIY approaches impractical for production. SimpleCrawl abstracts the complexity while keeping you within ethical bounds.

LinkedIn scraping carries specific legal nuances:

  • hiQ v. LinkedIn (2022) — the Ninth Circuit ruled that scraping publicly available LinkedIn profiles does not violate the CFAA. However, this ruling is narrow in scope.
  • LinkedIn's User Agreement — explicitly prohibits scraping. Violating ToS can result in account termination and legal action.
  • GDPR compliance — LinkedIn profile data is personal data under GDPR. Scraping EU profiles requires a lawful basis (legitimate interest) and data protection measures.
  • CCPA — California residents have rights over their personal information. Scraped data may be subject to deletion requests.
  • Do not scrape private information — messages, connection lists, and non-public profiles are off-limits.

Use our robots.txt checker to review LinkedIn's crawling permissions. Always consult legal counsel and consider LinkedIn's official API for approved use cases.

FAQ

Can I scrape LinkedIn without logging in?

Public profiles show limited data without authentication. Job listings and company pages are more accessible without login. For full profile data, you typically need an authenticated session — which SimpleCrawl handles through its managed infrastructure.

How do I avoid getting my LinkedIn account banned?

Use residential proxies, randomize request timing (2–5 seconds between pages), rotate user agents, and limit daily page views. Or use SimpleCrawl, which manages this automatically without risking your personal account.

Is the LinkedIn API a better alternative?

LinkedIn's official API is restrictive — it requires partnership agreements for most data access and limits the types of data you can retrieve. For comprehensive data extraction, web scraping remains the most practical approach.

What's the best way to scrape LinkedIn job data?

For reliable job data extraction, use SimpleCrawl's extract mode with a defined schema. This returns structured JSON with title, company, salary, and requirements — without building custom parsers that break when LinkedIn updates their UI.

How much LinkedIn data can I scrape with SimpleCrawl?

The free tier includes 500 credits/month. LinkedIn pages typically cost 2 credits (due to JS rendering). Paid plans scale to hundreds of thousands of pages. See pricing for details.

Ready to try SimpleCrawl?

We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.

More scraping guides

Get early access + 500 free credits