I am the SEO director of Vitals, a comprehensive health care and doctor information and review site. There are over one million health professionals in the United States alone. This means a lot of categorization and pagination is necessary to organize all the providers by name, city and specialty. Our pagination strategy has changed several times to keep in step with Google’s latest recommendations. At the end of this article, I will present a Google recommendation history.
Pagination can occur in many formats. First is article pagination when a single article spans across two or more pages. Next there is gallery pagination when every item in a gallery has its own page. There is also forum pagination where threads can span many pages. Category pagination is when listings span several pages. These lists can be in the form of products or anything else that can be placed in categories. A newer form of pagination is infinite scroll pagination, where data is pre-fetched from a subsequent page and added directly to the userβs current page as the scroll down the page.
You should note every pagination type on your site and discern which of the pagination options shown below works best for your situation. You should also determine which pages in the series would provide additional value to surface in the Google index. An article spanning several pages should allow Google to read and index the keywords from the entire article. Likely, on lists of products, you would want the search engines to have a crawlable path to all your product listings. A component page in a paginated series can be valuable as: 1) a component page with good content that completes the series 2) a crawlable path to reach individual items content.
The best time to deal with pagination structure is during the design process. This will avoid any issues with having to re-code or restructure post launch.
βDo whatβs good for the userβ
Often product managers are resistant to change existing pagination by citing Matt Cutts, βdo whatβs good for the user, not for search enginesβ. This is certainly the top priority, but I would like to add one crucial element, βdo whatβs valuable to searchers trying to find your businessβ, otherwise, thereβs no value to your content. You also need to code the pages properly, so the search engines can act as the intermediary between your users and your quality content. Over the course of the last few years, Google has laid out instructions in Google Webmaster forums on how paginations pages should be structured and coded.
What can be the problems with pagination?
Google will crawl all the pagination pages if you let it. However, the Google crawler bandwidth can have its site crawl limitations. You don’t want the crawler to get tied up in paginated pages, especially if the pagination pages do not add any Google indexation value over page one. Increasing the number of categories or items per page can decrease the depth of pagination.
Incorrect code implementation can dilute page juice across the paginated pages which will also prevent link juice transferring to pages that they link to.
Paginated pages are vulnerable to duplication filtering by the search engines. Coding paginated pages correctly will let the search engines know that they are pagination pages and will not be flagged as duplication.
A lot of paginated pages do not have a significant amount of quality content on them. The Panda algorithm can penalize an entire site if it finds too much low quality content. Thankfully, Google has given us relatively clear guidelines on best practices for pagination. Here are some pertinent excerpts from Googleβs recommendations.
A brief history on Googleβs recommendations:
Β http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html
Β 9/15/2011 Google Webmaster Central
Here are three options for a series:
- Leave whatever you have exactly as-is. Paginated content exists throughout the web and weβll continue to strive to give searchers the best result, regardless of the pageβs rel=βnextβ/rel=βprevβ HTML markupβor lack thereof.
- If you have a view-all page, or are considering a view-all page, see our post onΒ View-all in search results.
- Hint to Google the relationship between the component URLs of your series with rel=βnextβ and rel=βprevβ. This helps us more accurately index your content and serve to users the most relevant page (commonly the first page). Implementation details below.
A few points to mention:
- The first page only contains rel=βnextβ and no rel=βprevβ markup.
- Pages two to the second-to-last page should be doubly-linked with both rel=βnextβ and rel=βprevβ markup.
- The last page only contains markup for rel=βprevβ, not rel=βnextβ.
- rel=βnextβ and rel=βprevβ values can be either relative or absolute URLs (as allowed by the tag). And, if you include a
link in your document, relative paths will resolve according to the base URL. - rel=βnextβ and rel=βprevβ only need to be declared within the section, not within the document .
- We allow rel=βpreviousβ as a syntactic variant of rel=βprevβ links.
- rel=”next” and rel=”previous” on the one hand and rel=”canonical” on the other constitute independent concepts.
- rel=βprevβ and rel=βnextβ act as hints to Google, not absolute directives.
- When implemented incorrectly, such as omitting an expected rel=”prev” or rel=”next” designation in the series, we’ll continue to index the page(s), and rely on our own heuristics to understand your content.
https://productforums.google.com/forum/#!topic/webmasters/YbXqwoyooGM
10/19/2011 β Maile Ohye in Google Forums
If you’ve marked page 2 to n of your paginated series as “noindex, follow” to keep low quality content from affecting users and/or your site’s rankings, that’s fine, you can additionally include rel=”next” and rel=”prev.” Noindex and rel=”next”/”prev” are entirely independent annotations.
This means that if you add rel=”next” and rel=”prev” to noindex’d pages, it still signals to Google that the noindex’d pages are components of the series (though the noindex’d pages will not be returned in search results). This configuration is totally possible (and we’ll honor it), but the benefit is mostly theoretical.
If you believe the user experience on page 2 to n provides little value — so much so that you’ve already marked these pages as noindex — then to ensure that these low-quality pages aren’t returned to users and/or considered in ranking updates such as Panda, even if you choose to add rel=”next” and rel=”prev,” you may want to consider keeping the noindex (or “noindex, follow”).
http://googlewebmastercentral.blogspot.com/2012/03/video-about-pagination-with-relnext-and.html
03/01/2012 β Maile Ohye in Google Forums
“Does rel=next/prev also work as a signal for only one page of the series (page 1 in most cases?) to be included in the search index? Or would noindex tags need to be present on page 2 and on?”
When you implement rel=”next” and rel=”prev” on component pages of a series, we’ll then consolidate the indexing properties from the component pages and attempt to direct users to the most relevant page/URL. This is typically the first page. There’s no need to mark page 2 to n of the series with noindex unless you’re sure that you don’t want those pages to appear in search results.
03/12/2012 –Β Maile Ohye in YouTube Video
http://googlewebmastercentral.blogspot.com/2014/02/faceted-navigation-best-and-5-of-worst.html
02/12/2014 β Maile Ohye in Google Webmaster Central
Best practices for new faceted navigation implementations or redesigns
New sites that are considering implementing faceted navigation have several options to optimize the βcrawl spaceβ (the totality of URLs on your site known to Googlebot) for unique content pages, reduce crawling of duplicative pages, and consolidate indexing signals.
Make all unnecessary URLs linksΒ rel=βnofollow”. This option minimizes the crawlerβs discovery of unnecessary URLs and therefore reduces the potentially explosive crawl space (URLs known to the crawler) that can occur with faceted navigation. rel=βnofollowβ doesnβt prevent the unnecessary URLs from being crawled (only a robots.txt disallow prevents crawling). By allowing them to be crawled, however, you can consolidate indexing signals from the unnecessary URLs with a searcher-valuable URL by adding rel=βcanonicalβ from the unnecessary URL to a superset URL
- Option 2: Robots.txt disallow
For URLs with unnecessary parameters, include a /filtering/
directory that will be robots.txt disallowβd. This lets all search engines freely crawl good content, but will prevent crawling of the unwanted URLs. For instance, if my valuable parameters were item, category, and taste, and my unnecessary parameters were session-id and price. I may have the URL:
If youβre not using a CDN (sites using CDNs donβt have this flexibility easily available in Webmaster Tools), consider placing any URLs with unnecessary parameters on a separate host — for example, creating main host www.example.com
and secondary host, www2.example.com
. On the secondary host (www2), set theΒ Crawl rate in Webmaster ToolsΒ to βlowβ while keeping the main hostβs crawl rate as high as possible. This would allow for more full crawling of the main host URLs and reduces Googlebotβs focus on your unnecessary URLs.
- Be sure there remains at least one click path to all items on the main host.
- If youβd like to consolidate indexing signals, consider adding rel=βcanonicalβ from the secondary host to a superset URL on the main host.
- Improve indexing of individual content pages withΒ rel=βcanonicalβΒ to the preferred version of a page. rel=βcanonicalβ can be used across hostnames or domains.
- Improve indexing of paginated content (such as page=1 and page=2 of the category βgummy candiesβ) by either:
- Adding rel=βcanonicalβ from individual component pages in the series to the categoryβs βview-allβ page (e.g. page=1, page=2, and page=3 of βgummy candiesβ with rel=βcanonicalβ to
category=gummy-candies&page=all
while making sure that itβs still a good searcher experience (e.g., the page loads quickly). - UsingΒ pagination markup with rel=βnextβ and rel=βprevβΒ to consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole.
- Adding rel=βcanonicalβ from individual component pages in the series to the categoryβs βview-allβ page (e.g. page=1, page=2, and page=3 of βgummy candiesβ with rel=βcanonicalβ to
- Include only canonical URLs inΒ Sitemaps.
- ConfigureΒ Webmaster Tools URL ParametersΒ if you have strong understanding of the URL parameter behavior on your site (make sure that there is still a clear click path to each individual item/article). For instance, with URL Parameters in Webmaster Tools, you can list the parameter name, the parameters effect on the page content, and how youβd like Googlebot to crawl URLs containing the parameter.
Note:Β URL “Parameter Handling”Β in Webmaster Tools allows the site owner to provide information about the siteβs parameters and recommendations for Googlebotβs behavior.
Let’s analyze Googleβs advice:
Option 1: The View All Page
Google clearly favors the View-All page option when the page loads quickly and users can easily find what they are looking for.Β This means that all items in a paginated series should be listed on the View-All page and all the paginated pages canonical tags to reference the View-All page. The paginated pages in this scenario are there to garner more page views and to make the lists per page more manageable for a user to read. The View-All page is primarily for the search engines. Β
Coding Instruction for the View-All Option:Β
- Create a single View-All page with all of the content from the paginated pages within a single series of pagination.
- Once you have created the View-All page, place a rel=βcanonicalβ tag in the head section of each paginated component page, referencing the View-All Page. (example: ). This will tell Google to treat each specific page in a paginated series as a segment of the View-All page and queries will return the View-All page as opposed to a relevant segment page of the pagination chain.
- Β In Google Webmaster Parameter Handling, set the paginated page parameter to βPaginatesβ and for Google to crawl every URL.
View-All Option Works Well:
- If your pagination does not have so many links or images that the View-All page will take a considerable time to load. Five seconds is already stretching the limit for many users, especially on mobile devices. With their preference of this option, I believe Google is indicating to us that this option is most beneficial. If your View-All pages are too large, then itβs time to think how to break Β your pagination down to more manageable levels.
- If you donβt mind that the View-All page is the only one that is allowed to be indexed in the search engines. This can undermine the main purpose of your pagination, which was to get more page views, as you want users to scroll through the navigation in manageable chunks of data.
Option 2: Block Pagination Beyond Page One
In some instances, you may want to structure your website so that the search engines do not access the paginated series of pages after the first page. This means that every product must have internal links from a first page of listings. This can be difficult to structure, but I have seen some sites use this method successfully. This method ensures that the bot crawler will not needlessly crawl unimportant pages and only your first main representative page will be indexed by the search engines. Be cautious using this option, as it will prevent search engines from indexing content in the rest of the article or from finding any products listed after the first page. If you will need to stuff in additional categorization to accomplish this goal of linking to every product URL or article on a first page, then this option can have the unintended consequence of a poor user experience and Google will certainly take notice of that.
Coding Instruction for the Blocking Pagination Option:Β Β
- Place a nofollow tag on all links to the paginated pages.
- Since the paginated pages will not get crawled, all link equity that the links receive will not get transferred. To prevent loss of page juice, you should limit the number of paginated links that will be shown on the first page.
- In Google Webmaster tools, under the Parameter Handling section, set the paginated page parameter to βPaginatesβ and for Google to crawl βNo URLsβ. This is another setting that requires extreme caution as parameters can be shared across various sections of the website and may have negative unintended consequences. If you are not confident and comfortable with these settings, leave the setting to “Let Googlebot Decide”.
Blocking Pagination works well if:
- Other pages on the site do not pass link equity to the paginated pages.
- All pages on the site are linked internally on pages the search engines are allowed to crawl and the links areΒ allowed to pass link equity.Β
Option 3: Implement Pagination Relationships
This option requires the use of βnextβ and βprevβ tags. The next and βprevβ tags establish the relationship between all pages in a paginated series. This coding relationship protects the paginated pages from being seen as duplicates. The robots βnoindex,followβ tag can implemented on the paginated pages if you believe there is absolutely no purpose for the paginated pages to surface in the Google index. This method ensures that link equity will not be wasted. The downside to this method is if you have excessive pagination, the crawlers may get caught up in crawling the paginated pages and not crawl key areas of your site.Β
Coding Instruction for the Relationship Option:Β Β
- Implement the rel βprevβ and βnextβ tags to indicate a sequence of paginated pages.
- Each page in this paginated series can have the sameΒ title tag, meta description and H1 tags.Β However, if you are allowing the paginated pages to get indexed, you may choose to haveΒ targeted keywords in all these tags instead.
- All pages should have the canonical tag set to its own URL and not to the first page. If the URLs have a tracking ID or extra parameters, the canonical tag may need extra consideration.
- If you donβt want the paginated pages to get indexed, set a robots meta tag to βnoindex,followβ in the head section of every page in the paginated series, excluding the first page. I will refer to this as Option 3B in the table below.
- In Google Webmaster Parameter Handling, set the paginated page parameter to βPaginatesβ and for Google to crawl every URL
Paginated Relationships works well if:
- If it can be implemented correctly. This extra coding can be challenging for some sites.
- You donβt have excessive pagination and the crawlers are not having trouble crawling your entire site.
A single site can use one or all the options shown above. Each pagination template on your site should be reviewed thoroughly to see which option makes sense to use. You may choose to use one or all of the above options on different content sections. I checked a selection of competitor sites and they all use Option 2 (blockΒ pagination)Β or Option 3 (paginated series). I again want to stress the major challenges with Option 2 and this option requires perfection in implementation to work correctly and the safer choices are either Option 1 (View-All page) or Option 3 (paginated series).Β I would surmise that although Google is promoting Option 1 (View-All page), most webmasters have not figured out how toΒ fit the View-All page into theirΒ user experience, and thereforeΒ will not implement it. However, if Google is promoting the view-All option, I am sure Google has discovered that the View-All Option is the preferred option by searchers, so webmasters mayΒ sometimes need to cast aside their own business objectives.
NYTimes and Zocdoc use Option 2 and blocked out all pagination pages from getting crawled and indexed. The other sites all use Option 3 with Vitals setting the robots tag on the pagination pages to βnofollow,indexβ. Avvoβs strategy is a combination of Option 1 with the canonical tag set to the primary page and Option 2 with the links to pagination tagged as βnofollowβ. It is advisable not mix-up or combine the various strategies or risk sending wrong signals to the search engines.
Major Pagination Challenges with All Options:
- Pay close attention to the crawler settings in Webmaster tools and also to your log files. Make sure Google is properlyΒ crawling all intended areas of the site.
- Make sure the parameter handling, robots.txt file, robots tag, anchor tag settings (follow or nofollow) and canonical tags all complement with each other and are implemented correctly. This is where most sites misconfigure their pagination.
- If your pagination is JavaScript driven, you should make sure that users can still access the pagination even when they disable JavaScript. More importantly, the crawlers will not crawl the paginated pages if this functionality is not enabled.
- Endless pagination is a major concern. If your last pagination page has the URL http://www.example.com/page4, then that pageΒ should result in a 404 and page4 should not have the rel=βnextβ pointing to page 5. This sounds obvious, but it is a common issue that can cause the crawlers to get bogged down and stuck in your pagination.
- Include only crawler accessible canonical URLs in your XML and HTML sitemaps. All URLs that are blocked by robots.txt, βnoindexβ robots tag, non-self canonicals and redirected URLs should not be included in the sitemaps. Only the first URL in a paginated series using βnextβ and βprevβ should be included in the sitemaps.
Pagination is complication. I hope that this article provides enough insight so that you can plan a proper strategy and provide the search engines logical paths and quality content. These methods will allow the search engines to crawl efficiently, resulting in strong rankings for your site content.