A SHORT INTRODUCTION…
We all know that the search engine robots more frequently visit popular pages, i.e. those that have the largest number of incoming links, both internal and external ones. The architecture of a website is usually correlated with the popularity of these pages expressed by number of backlinks:
- Home page has the most backlinks,
- 1st (e.g. product categories), 2nd & 3rd level pages obtain less links,
- finally the least important are deep pages (with articles, classified ads, product pages, etc).
The above mentioned βimportanceβ of web pages versus the web site architecture has been illustrated in one of the Rand’s posts titled “Diagrams for Solving Crawl Priority & Indexation Issues“:
Important pages tend to have a different priority of indexation, and this was also presented very nicely by Rand:
Purple spots are those with the highest number of external links. As it can be seen, the pages which are close, take some of the popularity and they pass part of it further (pink spots). All the other spots stand for pages that are too far from the entrance points of search engine robots, which means that the chance of their indexation is much smaller.
In case of classified websites, which contain a lot of content, the above diagram should include subsequent category listing or search results pages. They are obviously less important than the main category pages, but their indexing additionally influences the indexation of their components – ad details pages. This is particularly important when the listing starts with so called premium ads, which change less often than standard classifieds.
BEFORE THE TEST…
Having this theoretical information, we have decided to see how it is like in practice. We have analyzed a website of http://www.morusek.pl (with animals and pets related classifieds from Poland) which has a total number of indexed pages exceeding 100,000. Using the combination of “site” and “inurl” queries we checked what is the number of indexed pages with a list of classifieds (in Polish βogloszeniaβ): http://www.google.pl/search?q=site%3Awww.morusek.pl+inurl%3A%22%2F0%2F%22+inurl%3Aogloszenia
The initial results were the following:
To continue the analysis, we excluded the first pages, as the numbers here are influenced by existence of some category pages with no classifieds at the moment, but which are indexable (there are crawlable links in the menu). In addition, to verify the effectiveness of the “site” query, we took into account a number of pages reported by Google Webmaster Tools (GWT) under “Internal Links”. The results were as follows:
WHAT’S IMPORTANT TO KNOW?
The first conclusion is obviously that the higher the page number is, the less probability that the page will be indexed. Secondly, while the actual numbers of GWT and βsiteβ queries vary a lot, the trends (slopes) are almost the same. On average, the chance that the robot will crawl to the next page of search results decreases by 1,2-1,3% per page.
It is also interesting that, according to Google Webmaster Tools, pages from 2 to 4 have a good indexation ratio which later decreases dramatically at the fifth position. For example, for sites with number 4 the level of indexation is 60%, while for pages number 15 it falls below 30% (according to Google Webmaster Tools), or 40% (for the command βsiteβ in Google). This is due to the fact that Googlebots have a much longer way to reach the appropriate link in case of the latter (a link to page 15 first appears on page 12), while there are direct links to pages 2, 3 and 4 on the first pages of search listings (see below):
THE SUBJECT OF THE TEST: INTRODUCING MORE LINKS
We decided to test what would be the changes in indexation ratios if we introduced more links to subsequent ad listings pages. On the first page of each category we added links to the 5th, 10th and 15th pages as show on the picture below:
After a month we tested the changes. Due to inaccurate results returned by the command βsiteβ in Google (number of indexed pages seemed to be greater than the actual number of them) we present data from Google Webmaster Tools (internal links) only:
THE RESULTS
The graph clearly shows us that indexation of pages that were added to the listing on the first page is much higher after the change (pages: 5th, 10th and 15th), and actually equals the indexation of pages 2, 3 and 4.
However, the increase in indexation of pages directly linked from the home page did not affect the indexation of the neighbouring pages. For example, we can see a huge increase for page 10, but there is no change for pages 9 and 11. The conclusion is that for Googlebots these pages are too far from the points of entry. Only category pages for main region have incoming links. To index page 9 of the intersection of categories and regions, the robots would have to go the following path:
- main category page (entry point),
- category page + region (first page of results),
- category page + region (tenth page of results),
- category page + region (page 9Β of the results).
What makes it even worse, not all the category pages have incoming links.
THE CONCLUSIONS
For classifieds or e-commerce websites, the conclusion is that the more pages linked in the listing, the greater the chance that they will be indexed. In general, it is clear that the farther from the point of entry (external link), the less chance that the page will be indexed. Therefore, it is advisable not to create sites with a very deep structure and to remember that the pages far from the points of entry should be additionally linked to (for example as “similar products”, “see also”, “related categories”, etc.).
Looking at the chart we can see yet another change β a slight decrease in indexation of pages 2, 3 and 4. This can be either because there are new pages added recently and they have not been indexed yet (when the number of ads in a certain category has started to exceed the space on the first page), or due to increase in the number of outcoming links on the first page. I would rather bet the first explanation, because in fact the new links were added to a small percentage of pages. There are only 400 fifth pages (so the links to fifth pages were placed on 0,5% of all the first pages). Pages 10 and 15 are even less numerous.
Introduction of additional links has not increased the level of indexation of classifieds, however I suppose that the rate of change was simply too small to affect their indexation. Moreover, the indexation of ads of Morusek.pl exceeded already 80% when the experiment started. Such changes can produce a visible increase in the number of indexed pages in case of sites where the rate of change is much higher and the level of indexation of classifieds or products – lower.