What are CSS Selectors? — SimpleCrawl Glossary
CSS selectors are patterns used to select and target specific HTML elements in a web page. In web scraping, they are the primary way to locate and extract data.
4 min read
Definition
CSS selectors are pattern-matching expressions used to target specific HTML elements in a document. Originally designed for applying styles in Cascading Style Sheets (CSS), selectors have become the standard way to locate and extract elements in web scraping, browser automation, and DOM manipulation.
A CSS selector like div.product > h2.title targets all <h2> elements with the class title that are direct children of a <div> with the class product. This precision makes selectors invaluable for extracting specific data points from complex web pages.
How CSS Selectors Work
CSS selectors match elements in an HTML-parsed DOM tree based on their tag names, attributes, classes, IDs, and relationships to other elements:
Basic selectors:
div— Selects all<div>elements (type selector).price— Selects elements withclass="price"(class selector)#main— Selects the element withid="main"(ID selector)*— Selects all elements (universal selector)
Attribute selectors:
[href]— Elements with anhrefattribute[data-type="product"]— Elements wheredata-typeequalsproduct[href^="https"]— Elements wherehrefstarts withhttps[class*="card"]— Elements whereclasscontainscard
Combinators:
div p— All<p>elements inside<div>(descendant)div > p—<p>elements that are direct children of<div>(child)h2 + p—<p>immediately after<h2>(adjacent sibling)h2 ~ p— All<p>siblings after<h2>(general sibling)
Pseudo-classes:
:first-child— First child element:nth-child(2)— Second child element:not(.hidden)— Elements that don't have thehiddenclass
Selectors can be combined for precision: table.data tbody tr:nth-child(odd) td:first-child selects the first cell in every odd row of a table with the class data.
CSS Selectors in Web Scraping
CSS selectors are the primary tool for pinpointing data in scraped HTML. Every major scraping library supports them:
- Element targeting — Select the exact elements containing the data you need: prices, titles, descriptions, links, images, or any other content.
- Batch extraction — Use a single selector to match all instances of a repeating pattern (e.g., all product cards on a listing page) and iterate over them.
- Nested extraction — Combine selectors to navigate complex page structures. First select a container, then use relative selectors within it to extract individual fields.
- Resilience — Well-crafted selectors using data attributes (
[data-testid="price"]) are more resistant to layout changes than selectors based on class names or tag hierarchy.
Compared to XPath (the other common selection language), CSS selectors are more concise, easier to read, and more familiar to frontend developers. Most scraping frameworks — Beautiful Soup, Cheerio, Puppeteer, Playwright — support both.
Selecting the right CSS selector strategy is crucial for maintainable scrapers. Overly specific selectors break when the site redesigns. Overly broad selectors capture unwanted elements.
How SimpleCrawl Handles CSS Selectors
SimpleCrawl supports CSS selectors as a first-class feature in its extraction API:
- Selector-based extraction — Pass one or more CSS selectors to the API and receive only the matching elements, already parsed and cleaned.
- Named selectors — Define a map of field names to selectors (e.g.,
{ "title": "h1.product-name", "price": "span.price" }) and get back structured JSON with your field names as keys. - Automatic fallbacks — SimpleCrawl can try multiple selectors for the same field, using the first one that matches. This handles sites that use different markup across pages.
- Content extraction — By default, SimpleCrawl returns the text content of matched elements, but you can request inner HTML, outer HTML, or specific attributes instead.
- Selector testing — Use SimpleCrawl's playground to test selectors against live pages before integrating them into your pipeline.
Whether you're extracting a single data point or building a full structured dataset, CSS selectors with SimpleCrawl give you precise control over what data comes back.
Related Terms
- HTML Parsing — Converting raw HTML into a queryable DOM tree
- Web Scraping — Automated data extraction from websites
- Structured Data — Machine-readable data formats on web pages
- Headless Browser — A browser without a GUI for rendering JavaScript
Ready to try SimpleCrawl?
We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.