SimpleCrawl
Back to Glossary
Glossary

What are CSS Selectors? — SimpleCrawl Glossary

CSS selectors are patterns used to select and target specific HTML elements in a web page. In web scraping, they are the primary way to locate and extract data.

4 min read

Definition

CSS selectors are pattern-matching expressions used to target specific HTML elements in a document. Originally designed for applying styles in Cascading Style Sheets (CSS), selectors have become the standard way to locate and extract elements in web scraping, browser automation, and DOM manipulation.

A CSS selector like div.product > h2.title targets all <h2> elements with the class title that are direct children of a <div> with the class product. This precision makes selectors invaluable for extracting specific data points from complex web pages.

How CSS Selectors Work

CSS selectors match elements in an HTML-parsed DOM tree based on their tag names, attributes, classes, IDs, and relationships to other elements:

Basic selectors:

  • div — Selects all <div> elements (type selector)
  • .price — Selects elements with class="price" (class selector)
  • #main — Selects the element with id="main" (ID selector)
  • * — Selects all elements (universal selector)

Attribute selectors:

  • [href] — Elements with an href attribute
  • [data-type="product"] — Elements where data-type equals product
  • [href^="https"] — Elements where href starts with https
  • [class*="card"] — Elements where class contains card

Combinators:

  • div p — All <p> elements inside <div> (descendant)
  • div > p<p> elements that are direct children of <div> (child)
  • h2 + p<p> immediately after <h2> (adjacent sibling)
  • h2 ~ p — All <p> siblings after <h2> (general sibling)

Pseudo-classes:

  • :first-child — First child element
  • :nth-child(2) — Second child element
  • :not(.hidden) — Elements that don't have the hidden class

Selectors can be combined for precision: table.data tbody tr:nth-child(odd) td:first-child selects the first cell in every odd row of a table with the class data.

CSS Selectors in Web Scraping

CSS selectors are the primary tool for pinpointing data in scraped HTML. Every major scraping library supports them:

  • Element targeting — Select the exact elements containing the data you need: prices, titles, descriptions, links, images, or any other content.
  • Batch extraction — Use a single selector to match all instances of a repeating pattern (e.g., all product cards on a listing page) and iterate over them.
  • Nested extraction — Combine selectors to navigate complex page structures. First select a container, then use relative selectors within it to extract individual fields.
  • Resilience — Well-crafted selectors using data attributes ([data-testid="price"]) are more resistant to layout changes than selectors based on class names or tag hierarchy.

Compared to XPath (the other common selection language), CSS selectors are more concise, easier to read, and more familiar to frontend developers. Most scraping frameworks — Beautiful Soup, Cheerio, Puppeteer, Playwright — support both.

Selecting the right CSS selector strategy is crucial for maintainable scrapers. Overly specific selectors break when the site redesigns. Overly broad selectors capture unwanted elements.

How SimpleCrawl Handles CSS Selectors

SimpleCrawl supports CSS selectors as a first-class feature in its extraction API:

  • Selector-based extraction — Pass one or more CSS selectors to the API and receive only the matching elements, already parsed and cleaned.
  • Named selectors — Define a map of field names to selectors (e.g., { "title": "h1.product-name", "price": "span.price" }) and get back structured JSON with your field names as keys.
  • Automatic fallbacks — SimpleCrawl can try multiple selectors for the same field, using the first one that matches. This handles sites that use different markup across pages.
  • Content extraction — By default, SimpleCrawl returns the text content of matched elements, but you can request inner HTML, outer HTML, or specific attributes instead.
  • Selector testing — Use SimpleCrawl's playground to test selectors against live pages before integrating them into your pipeline.

Whether you're extracting a single data point or building a full structured dataset, CSS selectors with SimpleCrawl give you precise control over what data comes back.

Ready to try SimpleCrawl?

We're building the simplest web scraping API for AI. Join the waitlist and get 500 free credits at launch.

Get early access + 500 free credits