If you’ve been following my posts on Linkscape’s index, you know that we’ve been trying to aim for fresher, better and larger indices over the past few months, but have been finding some very tough challenges. It turns out that indexing the web, canonicalizing millions of pages and calculating a link graph with quality metrics is super-hard; who knew? 🙂
As part of those efforts, we’ve been working toward an experimental index that leverages a more search-engine style crawler that crawls fresher pages/sites more often and less fresh stuff less frequently. That index, however, is taking its sweet time (and we’re doing a lot of babysitting and monitoring to make sure it’s smooth). Our tentative plan is to have that index launched in the next 2 weeks, but we felt that since our last index was at the very end of November, a new one with fresher data was warranted. Hence, last night, we launched an interim index with the following metrics:
- 36,660,519,013 (36 billion) URLs
- 427,626,242 (427 million) Subdomains
- 128,149,029 (128 million) Root Domains
- 387,656,119,262 (387 billion) Links
-
Followed vs. Nofollowed
- 2.05% of all links found were nofollowed
- 55.00% of nofollowed links are internal, 45.00% are external
- Rel Canonical – 10.57% of all pages now employ a rel=canonical tag
-
The average page has 69.12 links on it (negligible from last index)
- 57.76 internal links on average
- 11.36 external links on average
This index is smalller than our last few, but the numbers look reasonably solid and the data’s from the first few weeks of December, so it should be helpful to all you link builders and analyzers. Do be aware, though, that this update is likely to only last a couple weeks before we replace it with our new version, for which we have high expectations (but don’t want to promise the moon just yet).
Also noteworthy – last night, when the index first launched, we experienced some wackiness with Page and Domain Authority scores. Those should have largely settled down to normalcy now, but if you see anything odd, please let us know.