How to Scrape Twitter/X — Complete Guide (2026)
Learn how to scrape Twitter (X) posts, profiles, and trends. Compare DIY Python methods with the SimpleCrawl API for reliable Twitter data extraction.
Twitter (now X) remains one of the most important real-time data sources on the internet. Scraping Twitter data enables sentiment analysis, brand monitoring, trend detection, news aggregation, and influencer research. Since Twitter's API became prohibitively expensive in 2023, web scraping has become the primary method for accessing Twitter data at scale. This guide covers practical approaches for extracting tweets, profiles, and engagement data.
What Data Can You Extract from Twitter/X?
Twitter pages expose rich social data across multiple content types:
- Tweets/posts — text, media (images, videos, GIFs), timestamps, engagement metrics (likes, retweets, replies, views, bookmarks)
- User profiles — display name, handle, bio, follower/following count, verified status, join date, location, website
- Threads — connected tweet chains with full conversation context
- Search results — tweets matching keywords, hashtags, or advanced search operators
- Trending topics — trending hashtags, topics by location, "What's happening" data
- Lists — curated user lists and their member tweets
- Spaces — live audio room metadata, scheduled spaces, participant info
This data powers brand monitoring dashboards, content aggregation platforms, financial sentiment tools, and AI agent knowledge bases.
Challenges When Scraping Twitter/X
Twitter has become one of the hardest platforms to scrape since Elon Musk's acquisition:
Aggressive Rate Limiting
Twitter now rate-limits even logged-in users to viewing ~600 posts/day on free accounts. Scraping encounters these same limits, plus additional API-level throttling.
Login Walls
Twitter increasingly requires authentication to view content. Many pages redirect unauthenticated users to login screens, making anonymous scraping nearly impossible.
Advanced Anti-Bot Systems
Twitter uses sophisticated bot detection: browser fingerprinting, behavioral analysis, request pattern monitoring, and Arkose Labs CAPTCHA challenges. Basic HTTP requests are blocked immediately.
GraphQL API Complexity
Twitter's frontend uses a complex GraphQL API with encrypted query parameters that change frequently. Reverse-engineering these endpoints requires constant maintenance.
Legal and Ethical Concerns
Twitter's ToS explicitly prohibit scraping. The platform has pursued legal action against scraping operations, and Elon Musk has publicly threatened lawsuits against data scrapers.
Method 1: Using SimpleCrawl API (Easiest)
SimpleCrawl handles Twitter's authentication walls, anti-bot systems, and dynamic rendering:
curl -X POST https://api.simplecrawl.com/v1/scrape \
-H "Authorization: Bearer sc_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://x.com/OpenAI",
"format": "extract",
"schema": {
"name": "string",
"handle": "string",
"bio": "string",
"followers": "string",
"following": "string",
"recent_tweets": [{
"text": "string",
"likes": "number",
"retweets": "number",
"date": "string"
}]
}
}'
For scraping tweets by search query:
{
"url": "https://x.com/search?q=web%20scraping%20API&src=typed_query&f=live",
"format": "extract",
"render_js": true,
"schema": {
"tweets": [{
"author": "string",
"handle": "string",
"text": "string",
"likes": "number",
"retweets": "number",
"timestamp": "string"
}]
}
}
Method 2: DIY with Python (Manual)
Using Playwright for Rendered Content
Since Twitter is a React SPA, you need a headless browser:
from playwright.sync_api import sync_playwright
import time
def scrape_twitter_profile(handle: str) -> dict:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={"width": 1280, "height": 720},
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36",
)
page = context.new_page()
page.goto(f"https://x.com/{handle}", wait_until="networkidle")
time.sleep(3)
name = page.text_content("[data-testid='UserName'] span")
bio = page.text_content("[data-testid='UserDescription']")
tweets = []
tweet_els = page.query_selector_all("[data-testid='tweet']")
for tweet_el in tweet_els[:5]:
text_el = tweet_el.query_selector("[data-testid='tweetText']")
if text_el:
tweets.append(text_el.text_content())
browser.close()
return {
"name": name,
"bio": bio,
"recent_tweets": tweets,
}
Using the xt-dlp Library
Several open-source tools exist for Twitter scraping. xt-dlp and snscrape were popular, though Twitter frequently breaks them:
# Note: Many Twitter scraping libraries are frequently broken
# due to Twitter's constant API changes. Always check for
# the latest compatible version.
# Alternative: use nitter instances (when available)
import requests
from bs4 import BeautifulSoup
def scrape_nitter(handle: str) -> list:
"""Scrape via Nitter (Twitter frontend proxy)."""
nitter_instances = [
"https://nitter.net",
"https://nitter.privacydev.net",
]
for instance in nitter_instances:
try:
resp = requests.get(
f"{instance}/{handle}",
headers={"User-Agent": "Mozilla/5.0"},
timeout=10,
)
if resp.status_code == 200:
soup = BeautifulSoup(resp.text, "html.parser")
tweets = []
for item in soup.select(".timeline-item .tweet-content"):
tweets.append(item.text.strip())
return tweets
except requests.RequestException:
continue
return []
For more Python scraping patterns, see our web scraping with Python guide.
Why SimpleCrawl Is Better for Twitter
| Feature | Twitter API (X) | DIY Scraping | SimpleCrawl |
|---|---|---|---|
| Cost | $100+/month | Free (unstable) | From $0 |
| Data access | Limited by tier | Full page | Full page |
| Auth required | OAuth + paid plan | Often required | API key only |
| Historical tweets | Limited | Very difficult | Supported |
| Real-time data | Streaming ($$) | Polling | On-demand |
| Maintenance | API changes | Constant breakage | Managed |
Twitter's official API starts at $100/month for basic access and $42,000/month for enterprise. SimpleCrawl provides comparable data access at a fraction of the cost. Compare options on our comparison page.
Legal Considerations
- Twitter's ToS prohibit scraping — violating ToS is a contractual matter, not a criminal offense under current case law.
- The X v. scrapers landscape — X Corp. has sent cease-and-desist letters to scraping operations, though no major court ruling has established scraping public tweets as illegal.
- Copyright concerns — individual tweets may be copyrighted by their authors. Bulk collection for commercial use raises fair use questions.
- GDPR — profile data (name, bio, location) is personal data under GDPR. Scraping EU users requires compliance with data protection regulations.
- Rate your impact — excessive scraping degrades the platform for other users. Respect reasonable rate limits.
Review Twitter's crawling permissions with our robots.txt checker.
FAQ
Can I still scrape Twitter for free?
Free scraping is possible but increasingly difficult. Twitter's login walls and anti-bot measures make unauthenticated scraping nearly impossible. SimpleCrawl's free tier (500 credits/month) is the most practical free option.
What happened to snscrape and other Twitter scrapers?
Most open-source Twitter scrapers broke after Twitter's API changes in 2023. Some are maintained by the community, but they frequently stop working when Twitter updates their internal API. SimpleCrawl provides a stable alternative.
How do I scrape Twitter trends?
Pass the Twitter Explore URL to SimpleCrawl with extract mode. The response includes trending topics, hashtags, and associated tweet counts.
Can I get historical Twitter data?
Twitter's official API limits historical access. Web scraping can access older tweets through search and profile pages, though Twitter limits how far back you can scroll. SimpleCrawl can extract whatever's accessible on the rendered page.
Is Twitter data useful for AI training?
Twitter data is widely used for sentiment analysis, NLP training, and knowledge bases. However, check X's terms regarding AI/ML use — they've added restrictions on training models with their data.
Ready to try SimpleCrawl?
We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.
More scraping guides
How to Scrape Amazon — Complete Guide (2026)
Learn how to scrape Amazon product data, prices, reviews, and rankings. Compare DIY Python scrapers with the SimpleCrawl API for reliable Amazon data extraction.
How to Scrape Google — Complete Guide (2026)
Learn how to scrape Google search results, SERP data, featured snippets, and People Also Ask boxes. Compare Python scrapers with the SimpleCrawl SERP API.
How to Scrape Indeed — Complete Guide (2026)
Learn how to scrape Indeed job listings, salaries, and company reviews. Compare Python scrapers with the SimpleCrawl API for reliable Indeed data extraction.
How to Scrape LinkedIn — Complete Guide (2026)
Learn how to scrape LinkedIn profiles, job listings, and company data. Covers DIY Python methods and the SimpleCrawl API for reliable LinkedIn data extraction.