Web Scraping Pagination Patterns Guide

A practical reference for scraping pagination across next buttons, load more flows, infinite scroll, and hidden APIs.

Pagination is one of the first places a web scraping project breaks. A site that looked simple on day one may quietly switch from numbered pages to a next button, from a load more control to infinite scroll, or from visible HTML to an internal JSON endpoint. This guide is a practical reference for recognizing the most common pagination patterns, choosing a stable extraction strategy, and knowing when to move from HTML scraping to network-level collection. If you need a durable mental model for scrape pagination across changing frontends, this is the page to keep handy.

Overview

The term pagination covers every way a site splits a long list into retrievable chunks. For scraping, that matters because your extractor usually needs more than the first screen of results. Product catalogs, job listings, article archives, search results, and directory pages all expose records in batches. The trick is that the user interface does not always reveal the real delivery mechanism.

In practice, most list pages fall into five broad patterns:

Traditional numbered pagination: page=2, page=3, or visible page links.
Next button pagination: a single forward control without obvious page numbers.
Load more pagination: a button appends another batch below the current list.
Infinite scroll: new results load automatically as the viewport approaches the bottom.
Background API pagination: the page shell stays mostly unchanged while JavaScript fetches data from JSON or GraphQL endpoints.

These patterns often overlap. A page can render an initial HTML block, then continue with API calls after the first interaction. A visible next button can merely trigger XHR requests. Infinite scroll may still use numbered cursors under the hood. That is why reliable scraping starts with observation, not assumptions.

A good workflow is:

Inspect the first page HTML.
Check the browser network panel while clicking or scrolling.
Identify whether the next batch is delivered as full HTML, partial HTML, or structured JSON.
Choose the simplest stable path: direct URL requests if possible, API requests if available, browser automation only when needed.

If you are choosing a stack for that workflow, see Playwright vs BeautifulSoup vs Selenium for Web Scraping: Which Stack Fits Your Use Case?. If the page depends heavily on client-side rendering, How to Scrape JavaScript-Heavy Websites in 2026 is a useful companion.

Core concepts

This section gives you a pattern library you can apply repeatedly, even as sites change implementation details.

1. Numbered and URL-based pagination

This is the most straightforward pattern. You may see links like ?page=2, ?p=3, /page/4/, or offsets such as ?offset=24. In many cases, a simple HTTP client plus an HTML parser is enough.

What to look for:

Anchor tags pointing to other pages.
Canonical URL patterns in page links.
Query parameters for page, offset, size, sort, or cursor.
Total result counts that let you estimate the stopping point.

Good stopping rules:

No next link found.
Response returns an empty list.
The set of extracted item IDs repeats.
A known max page is reached.

Main risk: page numbers may exist for users, while actual content loads asynchronously after initial render. If extracted HTML contains placeholders but no real records, inspect network activity before building a loop.

2. Next button scraping

A next page scraper often starts by selecting a single forward button repeatedly. That sounds simple, but the button may be a standard link, a JavaScript event handler, or a disabled element that changes only after data arrives.

What to look for:

An <a> element with an href to the next page.
A button that updates list content in place.
ARIA labels such as “next”, “forward”, or “older”.
Disabled states on the last page.

Reliable approach:

Extract records from the current page.
Store a page fingerprint, such as the first and last item ID.
Trigger the next action.
Wait for a specific list change, not just a fixed timeout.
Stop if the fingerprint repeats or no new items appear.

This is where selectors matter more than people expect. If your scraper targets the button by brittle class names generated by a frontend framework, it may fail after a minor redesign. Prefer semantic anchors, ARIA labels, visible text, or structural selectors where possible. For a deeper selector strategy, see Best XPath and CSS Selector Strategies for Web Scraping: A Living Guide.

3. Load more button scraping

Load more pagination is common on e-commerce, media archives, and directory pages. Instead of replacing the page, the site appends another batch to the current DOM. This changes the extraction logic: if you scrape the whole page after every click, you may accidentally duplicate older records.

What to look for:

A “Load more” or “Show more” button near the end of the list.
An increase in DOM node count after each click.
Network requests carrying page size, offset, or cursor values.
A button that disappears when the list is exhausted.

Best practice:

Track item identifiers and deduplicate aggressively.
Extract only newly appended nodes when possible.
Watch for hidden request parameters such as category filters or session values.
Stop when the button vanishes, becomes disabled, or a request returns no new records.

For load more button scraping, browser automation is often the fastest way to prototype, but not always the best long-term method. If the click simply triggers a JSON endpoint, reproducing that request directly is usually faster, cheaper, and easier to maintain.

4. Infinite scroll scraping

Infinite scroll scraping is less about “scrolling” and more about detecting what event causes the next request. Some sites fetch a new batch whenever the viewport hits a threshold. Others use an IntersectionObserver watching a sentinel element near the bottom. In both cases, scrolling is only a trigger for a backend request.

What to look for:

New requests in the network panel when you scroll downward.
A hidden cursor token in request payloads.
A list container that grows while the URL stays unchanged.
Virtualized lists where only visible items exist in the DOM at any moment.

Main challenge: not all loaded items remain in the DOM. Some frontend frameworks virtualize long lists for performance. If your script reads only current DOM nodes at the end, it may miss earlier records that were removed from view.

Safer tactics:

Capture records incrementally after each batch loads.
Intercept network responses if possible.
Store unique IDs immediately instead of waiting until the end.
Use a stop condition based on repeated cursor values, missing requests, or no increase in unique items.

If the page is entirely JavaScript-driven, you will often benefit from a browser-based workflow first, then optimization later. That transition path is covered well in browser rendering guides and Playwright scraping tutorials.

5. API and background request pagination

This is often the cleanest path, even when the user sees a complex frontend. The browser may call REST, GraphQL, or internal JSON endpoints that return structured data with offsets, cursors, page sizes, and metadata. If those requests can be reproduced consistently, they are usually preferable to scraping rendered HTML.

What to look for in the network panel:

Fetch or XHR requests after clicking next, load more, or scrolling.
JSON payloads containing arrays of items.
Parameters like cursor, offset, limit, page, endCursor, or hasNextPage.
Request headers that must be preserved, such as auth tokens, CSRF values, or locale settings.

Benefits:

Cleaner data structures.
Less fragile than CSS selectors.
Better speed and lower resource use.
Easier deduplication using native IDs.

Caution: some endpoints are tightly coupled to browser state. Tokens may expire. GraphQL variables may change shape. Anti-automation checks may be stricter at the API layer than in page HTML. Even so, discovering the endpoint gives you insight into how pagination really works.

6. Cursors, offsets, and stopping conditions

The biggest difference between a toy scraper and a durable one is usually the stop logic. You do not want an endless loop, partial extraction, or silent duplication.

Common pagination controls include:

Page numbers: easy to reason about but not always exposed.
Offsets: fetch items 0-20, then 20-40, and so on.
Cursors: opaque tokens pointing to the next batch.
Timestamps or IDs: fetch items after a specific point.

Strong stop conditions combine more than one signal:

No next token returned.
No new unique IDs found.
Response count below expected batch size.
Known item count reached.
Repeated request signature or repeated first item.

This matters especially when sites recycle records, inject sponsored items, or reorder lists during scraping.

Pagination discussions quickly overlap with adjacent frontend and scraping concepts. Understanding the differences will save time when debugging.

Server-side rendering (SSR): HTML is largely present in the response. Traditional requests and parsers often work well.
Client-side rendering (CSR): JavaScript assembles the page after load. The HTML response may be sparse, so browser tools or direct API calls become more important.
Hydration: static or server-rendered content becomes interactive after JavaScript loads. A page may look complete but still rely on API calls for subsequent pagination.
XHR/fetch interception: capturing network calls made by the page. This is often the fastest route to understanding hidden pagination.
Virtualized list: only a subset of rows exists in the DOM at one time. Common in large result sets and admin-style web apps.
Cursor-based pagination: instead of page numbers, the server returns a token for the next request. Common in APIs and GraphQL.
Rate limiting: the target restricts request volume. Pagination loops need pacing and retry logic.
Deduplication: preventing the same item from being stored multiple times across pages or batches.

It also helps to distinguish navigation from data transport. A next button is navigation. The actual transport may be HTML, JSON, GraphQL, or streamed partial markup. When developers confuse the two, they overuse browser automation where a simple request loop would have been enough.

Practical use cases

Here is how these pagination patterns show up in real scraping work, and how to respond without overengineering the first version.

E-commerce category scraping

Retail category pages commonly mix visible filters, load more controls, and API-backed product grids. Start by checking whether filter changes and pagination clicks call a JSON endpoint. If yes, collect product IDs, names, prices, stock hints, and URLs from the response, then enrich with product-page scraping only when needed. This reduces DOM parsing and avoids duplicate extraction from appended grids.

Job boards and company directories

These often use next buttons or infinite scroll with offset-based APIs. The list page may expose summary fields, while detail pages hold richer company or role data. A practical design is two-stage scraping: first collect all listing URLs and IDs using the most stable pagination route, then process detail pages separately. That keeps pagination logic isolated and easier to repair when the frontend changes.

If your downstream goal is research or market mapping, this structure fits well with workflows like enriching scraped company lists with technographic and hiring signals or ranking vendors using scraped signals.

News archives and content libraries

Archives often still use numbered pages, which is good for durability. But some modern media sites render article cards through JavaScript and lazy-load images or summaries. In these cases, test whether the article URLs are already present in the first HTML response. If yes, a lightweight parser may be enough. If not, inspect network calls before committing to a browser-driven scraper.

Search results with changing sort order

Pagination becomes harder when result order is unstable. New listings may enter the top of the set while you are scraping page three or four. Here, deduplication by unique ID is essential, and the safest strategy may be shorter scrape windows, snapshotting query parameters, or collecting from newest to oldest using stable sort settings where available.

Single-page applications and internal dashboards

These frequently rely on GraphQL or JSON APIs with cursor-based pagination. The visible interface may feel impossible to scrape from HTML, but the network layer can be surprisingly structured. When you find a connection between the UI action and a query payload, log variables carefully. Cursor fields, page-size defaults, and nested edges or nodes often define the whole extraction plan.

For pages built on JavaScript-heavy frameworks, it can help to start with a rendered browser session, then simplify after you understand the transport layer. That pattern aligns with browser rendering comparisons and selector work from the site’s scraping guides.

A simple diagnostic checklist

When you land on any paginated list, run this checklist before writing production code:

Can I see item links in raw HTML?
Does clicking next change the URL?
Does clicking next trigger a network request?
Does scrolling trigger a request?
Do responses contain unique item IDs?
Is the DOM replacing old items or appending new ones?
What tells me I have reached the end?
What fields do I need from the list page versus detail pages?

Those eight questions resolve most pagination uncertainty quickly.

When to revisit

Pagination logic should be revisited whenever a target site changes frontend behavior, extraction quality drops, or maintenance cost starts exceeding the value of the current approach. In practice, this means revisiting the topic when:

A site switches from page links to load more or infinite scroll.
Your scraper begins returning fewer records than expected.
Selectors still work, but network behavior changes.
Duplicate records suddenly increase.
A browser-based scraper becomes too slow or expensive and may be replaceable with direct requests.
A previously accessible API starts returning different cursor or token formats.

The most useful update habit is to treat pagination as a monitored component, not a one-time setup. Keep a small validation routine that checks:

Expected batch size.
Growth in unique record count.
Presence of next-page controls or next tokens.
Key response fields used for parsing.

When that validation fails, do not start by rewriting the whole scraper. Re-open the page in a browser, repeat the diagnostic checklist, and confirm whether the site changed its navigation, transport, or DOM structure. Small frontend changes often only require switching from one pattern in this guide to another.

Action plan for your next scraper:

Classify the page as numbered, next button, load more, infinite scroll, or API-backed.
Inspect network traffic before choosing tools.
Prefer direct requests where the data transport is stable.
Use browser automation when interaction or rendering is genuinely required.
Store unique IDs from the start and deduplicate throughout.
Define a stop condition before you run the full job.
Log enough metadata to detect pagination drift later.

That process is what makes scrape pagination manageable over time. Frontends will continue to change, but the core question stays the same: what is the site using to deliver the next batch of records, and what is the simplest reliable way to capture it?

Web Scraping Pagination Patterns: Next Buttons, Infinite Scroll, Load More, and APIs

Overview