Is it legal to scrape Yelp?

Scraping publicly available data from Yelp is generally permissible, but you should always review their Terms of Service and robots.txt. SimpleCrawl helps you scrape responsibly while respecting rate limits.

What is the best API to scrape Yelp?

SimpleCrawl is designed for scraping Yelp with built-in proxy rotation, JavaScript rendering, and anti-bot bypass. It returns clean markdown or structured JSON in a single API call.

How to Scrape Yelp — Complete Guide (2026)

Learn how to scrape Yelp business listings, reviews, and ratings. Compare Python scrapers with the SimpleCrawl API for reliable Yelp data extraction.

March 6, 20266 min read

Yelp is the leading platform for local business reviews, with data on millions of businesses across restaurants, services, retail, and more. Scraping Yelp enables competitive analysis, reputation monitoring, location intelligence, and lead generation for local businesses. This guide covers practical methods for extracting Yelp business data, reviews, and ratings at scale.

What Data Can You Extract from Yelp?

Yelp business and review pages contain rich structured data:

Business details — name, address, phone number, website, hours, price range ($ to $$$$), categories, amenities, services offered
Ratings and reviews — overall star rating, review count, individual review text, reviewer name, review date, star rating per review, photos attached to reviews
Photos — business photos, food photos, interior/exterior shots, user-uploaded images
Search results — businesses matching query and location, with ratings, review counts, and category tags
Menu data — menu items, prices, popular dishes (for restaurants)
Service quotes — request-a-quote data, response time, hiring rate
Competitor data — "People also viewed" businesses, similar businesses nearby

This data powers local SEO tools, reputation management platforms, price monitoring for local services, and market research for franchises and multi-location businesses.

Challenges When Scraping Yelp

Yelp has mature anti-scraping defenses:

JavaScript Rendering Requirements

Yelp uses React for its frontend. Review content, business details, and search results load dynamically. Static HTML fetches return skeleton markup with no useful data.

Review Pagination and Filtering

Yelp paginates reviews (10 per page) and filters some reviews into a "not recommended" section. Capturing all reviews requires multiple paginated requests and handling Yelp's recommendation algorithm.

Anti-Bot Detection

Yelp uses device fingerprinting, behavioral analysis, and rate limiting. Automated requests are detected through TLS fingerprinting, cookie analysis, and request timing patterns.

Content Obfuscation

Yelp occasionally obfuscates phone numbers and addresses using CSS tricks or JavaScript-rendered text, making simple HTML parsing insufficient.

API Limitations

Yelp's Fusion API exists but has strict rate limits (5,000 requests/day) and doesn't expose full review text — only a 160-character snippet. This makes the official API inadequate for review analysis.

Method 1: Using SimpleCrawl API (Easiest)

SimpleCrawl renders Yelp pages fully and returns structured business data:

curl -X POST https://api.simplecrawl.com/v1/scrape \
  -H "Authorization: Bearer sc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.yelp.com/biz/tartine-bakery-san-francisco",
    "format": "extract",
    "schema": {
      "name": "string",
      "rating": "number",
      "review_count": "number",
      "price_range": "string",
      "categories": ["string"],
      "address": "string",
      "phone": "string",
      "hours": "string",
      "reviews": [{
        "author": "string",
        "rating": "number",
        "date": "string",
        "text": "string"
      }]
    }
  }'

For search results:

{
  "url": "https://www.yelp.com/search?find_desc=pizza&find_loc=Chicago",
  "format": "extract",
  "schema": {
    "businesses": [{
      "name": "string",
      "rating": "number",
      "review_count": "number",
      "price_range": "string",
      "address": "string",
      "categories": ["string"]
    }]
  }
}

Method 2: DIY with Python (Manual)

Using Yelp's Fusion API (Limited)

import requests

API_KEY = "YOUR_YELP_API_KEY"

def search_yelp(term: str, location: str, limit: int = 20) -> list:
    url = "https://api.yelp.com/v3/businesses/search"
    headers = {"Authorization": f"Bearer {API_KEY}"}
    params = {"term": term, "location": location, "limit": limit}

    response = requests.get(url, headers=headers, params=params)
    data = response.json()

    businesses = []
    for biz in data.get("businesses", []):
        businesses.append({
            "name": biz["name"],
            "rating": biz["rating"],
            "review_count": biz["review_count"],
            "price": biz.get("price", "N/A"),
            "address": ", ".join(biz["location"]["display_address"]),
            "phone": biz.get("display_phone", ""),
            "url": biz["url"],
        })

    return businesses

results = search_yelp("restaurants", "San Francisco")
for biz in results:
    print(f"{biz['name']} — {biz['rating']}★ ({biz['review_count']} reviews)")

Web Scraping with Playwright (Full Reviews)

The Fusion API only returns review snippets. For full review text, you need to scrape:

from playwright.sync_api import sync_playwright
import time

def scrape_yelp_reviews(business_url: str, max_pages: int = 3) -> list:
    reviews = []

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        for page_num in range(max_pages):
            url = f"{business_url}?start={page_num * 10}"
            page.goto(url, wait_until="networkidle")
            time.sleep(2)

            review_els = page.query_selector_all("[data-review-id]")
            for el in review_els:
                author = el.query_selector("a[href*='/user_details']")
                text = el.query_selector("span.raw__09f24__T4Ezm")
                rating = el.query_selector("[aria-label*='star rating']")

                reviews.append({
                    "author": author.text_content().strip() if author else None,
                    "text": text.text_content().strip() if text else None,
                    "rating": rating.get_attribute("aria-label") if rating else None,
                })

        browser.close()

    return reviews

reviews = scrape_yelp_reviews("https://www.yelp.com/biz/tartine-bakery-san-francisco")
for r in reviews:
    print(f"{r['rating']} — {r['text'][:100]}...")

For more Python scraping techniques, see our web scraping with Python guide.

Why SimpleCrawl Is Better for Yelp

Feature	Yelp Fusion API	DIY Scraping	SimpleCrawl
Full review text	No (snippets only)	Yes (fragile)	Yes (stable)
Rate limits	5,000/day	IP-based	High throughput
Auth required	API key	No	API key
Review photos	URLs only	Complex	Included
Business hours	Yes	Parsing required	Structured
Maintenance	Low (official API)	High	Zero

SimpleCrawl bridges the gap between Yelp's limited API and unreliable DIY scraping. Full review text, structured data, no maintenance. See pricing for details.

Legal Considerations

Yelp's ToS prohibit scraping — Yelp explicitly prohibits automated access in their Terms of Service.
Yelp v. scrapers — Yelp has pursued legal action against review scraping operations, particularly those that republish or manipulate review data.
Review content ownership — Yelp reviews are copyrighted by their authors and licensed to Yelp. Republishing full reviews without permission raises copyright issues.
Business data — business names, addresses, and phone numbers are generally considered public facts and are less legally protected than review content.
Use the Fusion API when possible — for basic business search data, Yelp's official API is the safest option. Use scraping for data the API doesn't provide (full reviews, menus).

Check Yelp's crawling permissions with our robots.txt checker.

FAQ

Can I get full Yelp reviews through the API?

No. Yelp's Fusion API returns only the first 160 characters of each review. For full review text, you need to scrape the web page or use SimpleCrawl's extract mode.

How do I scrape "not recommended" Yelp reviews?

Yelp hides reviews it deems unreliable behind a separate page. These can be accessed by navigating to the "not recommended reviews" link on the business page. SimpleCrawl can extract these with the right URL.

Is Yelp scraping useful for competitive analysis?

Absolutely. Monitoring competitor reviews, ratings, and response patterns provides actionable intelligence for local businesses. SimpleCrawl makes this data accessible in structured JSON.

How many Yelp pages can I scrape per day?

DIY scrapers typically get blocked after 200-500 pages. SimpleCrawl supports thousands of pages daily through its distributed proxy infrastructure. Check pricing for credit details.

Can I scrape Yelp restaurant menus?

Yes. Yelp menu pages are accessible through scraping. SimpleCrawl extracts menu items, prices, and descriptions when available on the business page.

Ready to try SimpleCrawl?

We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.