Is it legal to scrape Twitter/X?

Scraping publicly available data from Twitter/X is generally permissible, but you should always review their Terms of Service and robots.txt. SimpleCrawl helps you scrape responsibly while respecting rate limits.

What is the best API to scrape Twitter/X?

SimpleCrawl is designed for scraping Twitter/X with built-in proxy rotation, JavaScript rendering, and anti-bot bypass. It returns clean markdown or structured JSON in a single API call.

How to Scrape Twitter/X — Complete Guide (2026)

Learn how to scrape Twitter (X) posts, profiles, and trends. Compare DIY Python methods with the SimpleCrawl API for reliable Twitter data extraction.

March 6, 20266 min read

Twitter (now X) remains one of the most important real-time data sources on the internet. Scraping Twitter data enables sentiment analysis, brand monitoring, trend detection, news aggregation, and influencer research. Since Twitter's API became prohibitively expensive in 2023, web scraping has become the primary method for accessing Twitter data at scale. This guide covers practical approaches for extracting tweets, profiles, and engagement data.

What Data Can You Extract from Twitter/X?

Twitter pages expose rich social data across multiple content types:

Tweets/posts — text, media (images, videos, GIFs), timestamps, engagement metrics (likes, retweets, replies, views, bookmarks)
User profiles — display name, handle, bio, follower/following count, verified status, join date, location, website
Threads — connected tweet chains with full conversation context
Search results — tweets matching keywords, hashtags, or advanced search operators
Trending topics — trending hashtags, topics by location, "What's happening" data
Lists — curated user lists and their member tweets
Spaces — live audio room metadata, scheduled spaces, participant info

This data powers brand monitoring dashboards, content aggregation platforms, financial sentiment tools, and AI agent knowledge bases.

Challenges When Scraping Twitter/X

Twitter has become one of the hardest platforms to scrape since Elon Musk's acquisition:

Aggressive Rate Limiting

Twitter now rate-limits even logged-in users to viewing ~600 posts/day on free accounts. Scraping encounters these same limits, plus additional API-level throttling.

Twitter increasingly requires authentication to view content. Many pages redirect unauthenticated users to login screens, making anonymous scraping nearly impossible.

Advanced Anti-Bot Systems

Twitter uses sophisticated bot detection: browser fingerprinting, behavioral analysis, request pattern monitoring, and Arkose Labs CAPTCHA challenges. Basic HTTP requests are blocked immediately.

GraphQL API Complexity

Twitter's frontend uses a complex GraphQL API with encrypted query parameters that change frequently. Reverse-engineering these endpoints requires constant maintenance.

Legal and Ethical Concerns

Twitter's ToS explicitly prohibit scraping. The platform has pursued legal action against scraping operations, and Elon Musk has publicly threatened lawsuits against data scrapers.

Method 1: Using SimpleCrawl API (Easiest)

SimpleCrawl handles Twitter's authentication walls, anti-bot systems, and dynamic rendering:

curl -X POST https://api.simplecrawl.com/v1/scrape \
  -H "Authorization: Bearer sc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://x.com/OpenAI",
    "format": "extract",
    "schema": {
      "name": "string",
      "handle": "string",
      "bio": "string",
      "followers": "string",
      "following": "string",
      "recent_tweets": [{
        "text": "string",
        "likes": "number",
        "retweets": "number",
        "date": "string"
      }]
    }
  }'

For scraping tweets by search query:

{
  "url": "https://x.com/search?q=web%20scraping%20API&src=typed_query&f=live",
  "format": "extract",
  "render_js": true,
  "schema": {
    "tweets": [{
      "author": "string",
      "handle": "string",
      "text": "string",
      "likes": "number",
      "retweets": "number",
      "timestamp": "string"
    }]
  }
}

Method 2: DIY with Python (Manual)

Using Playwright for Rendered Content

Since Twitter is a React SPA, you need a headless browser:

from playwright.sync_api import sync_playwright
import time

def scrape_twitter_profile(handle: str) -> dict:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1280, "height": 720},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                       "AppleWebKit/537.36 Chrome/122.0.0.0 Safari/537.36",
        )
        page = context.new_page()
        page.goto(f"https://x.com/{handle}", wait_until="networkidle")
        time.sleep(3)

        name = page.text_content("[data-testid='UserName'] span")
        bio = page.text_content("[data-testid='UserDescription']")

        tweets = []
        tweet_els = page.query_selector_all("[data-testid='tweet']")
        for tweet_el in tweet_els[:5]:
            text_el = tweet_el.query_selector("[data-testid='tweetText']")
            if text_el:
                tweets.append(text_el.text_content())

        browser.close()
        return {
            "name": name,
            "bio": bio,
            "recent_tweets": tweets,
        }

Using the xt-dlp Library

Several open-source tools exist for Twitter scraping. xt-dlp and snscrape were popular, though Twitter frequently breaks them:

# Note: Many Twitter scraping libraries are frequently broken
# due to Twitter's constant API changes. Always check for
# the latest compatible version.

# Alternative: use nitter instances (when available)
import requests
from bs4 import BeautifulSoup

def scrape_nitter(handle: str) -> list:
    """Scrape via Nitter (Twitter frontend proxy)."""
    nitter_instances = [
        "https://nitter.net",
        "https://nitter.privacydev.net",
    ]

    for instance in nitter_instances:
        try:
            resp = requests.get(
                f"{instance}/{handle}",
                headers={"User-Agent": "Mozilla/5.0"},
                timeout=10,
            )
            if resp.status_code == 200:
                soup = BeautifulSoup(resp.text, "html.parser")
                tweets = []
                for item in soup.select(".timeline-item .tweet-content"):
                    tweets.append(item.text.strip())
                return tweets
        except requests.RequestException:
            continue
    return []

For more Python scraping patterns, see our web scraping with Python guide.

Why SimpleCrawl Is Better for Twitter

Feature	Twitter API (X)	DIY Scraping	SimpleCrawl
Cost	$100+/month	Free (unstable)	From $0
Data access	Limited by tier	Full page	Full page
Auth required	OAuth + paid plan	Often required	API key only
Historical tweets	Limited	Very difficult	Supported
Real-time data	Streaming ($$)	Polling	On-demand
Maintenance	API changes	Constant breakage	Managed

Twitter's official API starts at $100/month for basic access and $42,000/month for enterprise. SimpleCrawl provides comparable data access at a fraction of the cost. Compare options on our comparison page.

Legal Considerations

Twitter's ToS prohibit scraping — violating ToS is a contractual matter, not a criminal offense under current case law.
The X v. scrapers landscape — X Corp. has sent cease-and-desist letters to scraping operations, though no major court ruling has established scraping public tweets as illegal.
Copyright concerns — individual tweets may be copyrighted by their authors. Bulk collection for commercial use raises fair use questions.
GDPR — profile data (name, bio, location) is personal data under GDPR. Scraping EU users requires compliance with data protection regulations.
Rate your impact — excessive scraping degrades the platform for other users. Respect reasonable rate limits.

Review Twitter's crawling permissions with our robots.txt checker.

FAQ

Can I still scrape Twitter for free?

Free scraping is possible but increasingly difficult. Twitter's login walls and anti-bot measures make unauthenticated scraping nearly impossible. SimpleCrawl's free tier (500 credits/month) is the most practical free option.

What happened to snscrape and other Twitter scrapers?

Most open-source Twitter scrapers broke after Twitter's API changes in 2023. Some are maintained by the community, but they frequently stop working when Twitter updates their internal API. SimpleCrawl provides a stable alternative.

How do I scrape Twitter trends?

Pass the Twitter Explore URL to SimpleCrawl with extract mode. The response includes trending topics, hashtags, and associated tweet counts.

Can I get historical Twitter data?

Twitter's official API limits historical access. Web scraping can access older tweets through search and profile pages, though Twitter limits how far back you can scroll. SimpleCrawl can extract whatever's accessible on the rendered page.

Is Twitter data useful for AI training?

Twitter data is widely used for sentiment analysis, NLP training, and knowledge bases. However, check X's terms regarding AI/ML use — they've added restrictions on training models with their data.

Ready to try SimpleCrawl?

We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.