How to Scrape Amazon — Complete Guide (2026)
Learn how to scrape Amazon product data, prices, reviews, and rankings. Compare DIY Python scrapers with the SimpleCrawl API for reliable Amazon data extraction.
Amazon is the world's largest e-commerce marketplace, and scraping Amazon product data powers a wide range of applications — from price monitoring and competitive intelligence to market research and inventory tracking. Whether you're building a price comparison tool or feeding product data into an AI agent, this guide walks you through every method for extracting Amazon data at scale.
What Data Can You Extract from Amazon?
Amazon product pages contain a wealth of structured data that's valuable for businesses and developers:
- Product details — title, description, ASIN, brand, category, dimensions, weight
- Pricing data — current price, list price, deal price, Buy Box winner, price history
- Reviews and ratings — star rating, review count, individual review text, reviewer info
- Seller information — seller name, fulfillment method (FBA vs FBM), seller rating
- Search results — organic rankings, sponsored placements, suggested products
- Best Seller rankings — BSR by category, historical ranking data
- Product images — main image, gallery images, variant images
This data feeds use cases like price monitoring, MAP compliance, product catalog enrichment, and lead generation for Amazon sellers.
Challenges When Scraping Amazon
Amazon invests heavily in anti-bot systems. Here's what makes scraping Amazon difficult:
IP Blocking and Rate Limiting
Amazon tracks request patterns by IP address. Send too many requests from the same IP, and you'll get HTTP 503 responses or CAPTCHA challenges. Residential proxies help, but Amazon's detection goes beyond simple IP checks.
Dynamic Page Rendering
Amazon increasingly relies on JavaScript to render product data, lazy-load images, and populate pricing widgets. A simple HTTP GET request won't capture dynamically loaded content — you need a headless browser or a rendering API.
CAPTCHA Challenges
Amazon serves CAPTCHAs (image-based and audio) when it detects automated traffic. These block your scraper entirely until solved, adding latency and complexity to any DIY solution.
Frequent Layout Changes
Amazon A/B tests page layouts constantly. CSS selectors that work today may break tomorrow. Maintaining a custom scraper means constant debugging and selector updates.
Anti-Bot Fingerprinting
Amazon uses browser fingerprinting — checking TLS signatures, JavaScript execution patterns, and mouse movement — to distinguish bots from real users. Basic requests or urllib calls are trivially detected.
Method 1: Using SimpleCrawl API (Easiest)
SimpleCrawl handles JavaScript rendering, proxy rotation, and anti-bot bypass automatically. Here's how to scrape an Amazon product page in one API call:
curl -X POST https://api.simplecrawl.com/v1/scrape \
-H "Authorization: Bearer sc_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.amazon.com/dp/B0CHX3QBCH",
"format": "markdown"
}'
For structured product data, use the AI extraction mode:
{
"url": "https://www.amazon.com/dp/B0CHX3QBCH",
"format": "extract",
"schema": {
"title": "string",
"price": "number",
"rating": "number",
"review_count": "number",
"availability": "string",
"seller": "string"
}
}
SimpleCrawl returns clean JSON matching your schema — no CSS selectors, no parsing logic, no maintenance. Check the pricing page for rate limits and credit costs.
Method 2: DIY with Python (Manual)
If you want full control, here's a Python approach using requests and BeautifulSoup. Be aware this breaks often due to Amazon's anti-bot measures.
Basic Setup
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/122.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
}
url = "https://www.amazon.com/dp/B0CHX3QBCH"
response = requests.get(url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
title = soup.select_one("#productTitle")
price = soup.select_one(".a-price .a-offscreen")
rating = soup.select_one("#acrPopover span.a-size-base")
print(f"Title: {title.text.strip() if title else 'N/A'}")
print(f"Price: {price.text.strip() if price else 'N/A'}")
print(f"Rating: {rating.text.strip() if rating else 'N/A'}")
else:
print(f"Blocked: {response.status_code}")
Adding Proxy Rotation
To avoid IP bans, you'll need rotating proxies:
import random
proxies_list = [
"http://user:pass@proxy1.example.com:8080",
"http://user:pass@proxy2.example.com:8080",
"http://user:pass@proxy3.example.com:8080",
]
proxy = random.choice(proxies_list)
response = requests.get(url, headers=headers, proxies={"http": proxy, "https": proxy})
Handling JavaScript with Playwright
For dynamically rendered content (variant prices, lazy-loaded reviews):
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://www.amazon.com/dp/B0CHX3QBCH")
page.wait_for_selector("#productTitle")
title = page.text_content("#productTitle")
price = page.text_content(".a-price .a-offscreen")
print(f"Title: {title.strip()}")
print(f"Price: {price.strip()}")
browser.close()
This approach works but requires managing browser instances, memory, and concurrency yourself. For a deeper dive, see our web scraping with Python guide.
Why SimpleCrawl Is Better for Amazon
| Feature | DIY Python | SimpleCrawl |
|---|---|---|
| Setup time | Hours to days | Minutes |
| Proxy management | Manual ($$) | Built-in |
| CAPTCHA solving | Manual integration | Automatic |
| JS rendering | Playwright/Selenium | Automatic |
| Maintenance | Constant | Zero |
| Structured output | Custom parsing | Schema-based AI extraction |
| Scalability | Limited by infra | Scales to millions |
The DIY approach is educational and gives you full control, but for production use cases — especially at scale — an API like SimpleCrawl eliminates the operational burden. Learn more about how we compare to other tools on our comparison page.
Legal Considerations
Scraping Amazon exists in a legal gray area. Key points:
- Public data is generally fair game — courts have ruled that scraping publicly accessible data doesn't violate the CFAA (see hiQ Labs v. LinkedIn).
- Amazon's ToS prohibit scraping — violating ToS isn't necessarily illegal, but Amazon can terminate your account or restrict access.
- Don't scrape personal data — extracting PII (customer emails, names) without consent violates GDPR and CCPA.
- Respect rate limits — hammering Amazon's servers can constitute a denial-of-service attack.
- robots.txt — Amazon's
robots.txtdisallows many paths. Review it with our robots.txt checker.
SimpleCrawl includes built-in rate limiting and respects ethical scraping practices. Always consult legal counsel for your specific use case.
FAQ
How often does Amazon change their page structure?
Amazon runs continuous A/B tests, so page layouts can change weekly. DIY scrapers need constant maintenance. SimpleCrawl's AI-powered extraction adapts to layout changes automatically.
Can I scrape Amazon without getting blocked?
Yes, but you need rotating residential proxies, proper headers, request throttling, and ideally a headless browser. SimpleCrawl bundles all of this into a single API call.
How many Amazon products can I scrape per day?
With a DIY setup, you're limited by your proxy pool and infrastructure. SimpleCrawl's free tier includes 500 credits/month — each product page costs 1 credit. Paid plans support millions of pages per month.
Is it better to use the Amazon Product Advertising API?
Amazon's official API is limited — it doesn't expose review text, BSR history, or all seller data. It also has strict rate limits and requires an Associates account. Scraping gives you access to everything visible on the page.
What format does SimpleCrawl return Amazon data in?
SimpleCrawl returns data in clean markdown (ideal for RAG pipelines) or structured JSON matching a schema you define. See the docs for full API reference.
Ready to try SimpleCrawl?
We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.
More scraping guides
How to Scrape Google — Complete Guide (2026)
Learn how to scrape Google search results, SERP data, featured snippets, and People Also Ask boxes. Compare Python scrapers with the SimpleCrawl SERP API.
How to Scrape Indeed — Complete Guide (2026)
Learn how to scrape Indeed job listings, salaries, and company reviews. Compare Python scrapers with the SimpleCrawl API for reliable Indeed data extraction.
How to Scrape LinkedIn — Complete Guide (2026)
Learn how to scrape LinkedIn profiles, job listings, and company data. Covers DIY Python methods and the SimpleCrawl API for reliable LinkedIn data extraction.
How to Scrape Reddit — Complete Guide (2026)
Learn how to scrape Reddit posts, comments, and subreddit data. Compare Reddit's API, Python scrapers, and SimpleCrawl for reliable Reddit data extraction.