In the last 10 months, we’ve taken a number of dramatic steps to improve the link information available to webmasters & SEOs. Today, I’m pleased to announce even more progress in that direction, as well and cover the impressive store of data now accessible.
Sections in this post:
- Linkscape’s Web Index Over Time
- Upgrades to Link Data & Metrics
- Tools to View Link Information
- The Future of Link Data
Linkscape’s Web Index Over Time
When Linkscape first launched last October, it featured ~30 Billion URLs – impressive, but much smaller than the depth we’ve reached today. Today we’re announcing our July index update (technically launched late last week) with 48.5 Billion URLs, slightly smaller than our last index, but with a greater focus on quality and less spam/junk.
# of Links in Each Index – March-July 2009
# of Pages in Each Index – March-July 2009
# of Root & Sub Domains in Each Index – March-July 2009
Perhaps not surprisingly, though, the more we crawl, the more it becomes evident that much of the web is fairly useless to index or serve. So while these numbers are to an extent meaningful, most of our work doesn’t change these statistics (and some of it decreases them) yet this work still should be contributing to improving the quality of our index.
July Linkscape Update: Upgrades to Link Data & Metrics
July’s index is the first to feature several important upgrades:
#1 – The “Via 301” Link Flag
When requesting link data for a site or page, we’ll now show you important links that are pointing to URLs that 301 redirect to that location. I still recall early feedback from Danny Sullivan, who was very upset that Linkscape didn’t show him many of what he considered the “most important links” to SearchEngineLand.com. As it turned out, a large number of those pointed to www.searchengineland.com (which does 301 redirect), hence the confusion. For deciding 301 strategy, people sometimes run reports on a 301’ed url to see just the links through it. This still works. Now, in addition those links are also shown on reports for the target of the 301.
#2 – MozRank “Evaporation” through NoFollows
The mozRank algorithm now “evaporates” link juice through nofollowed links in much the same fashion that Google messaged their change to PageRank. For those wondering why the SEO world didn’t notice the nofollow change, there’s some fairly compelling information in the correlation data between mozRank & Google’s toolbar PageRank:
Note that due to mozRank’s ability to show greater data refinement (e.g. 5.57 vs. just “5”), a “perfect” correlation would average 0.25. Thus, the MAE (Mean Average Error) is still remarkably close, but clearly changing the nofollow treatment had only a very slight impact.
#3 – Canonical URL Tags Now Indexed
Although this began in our last index, it’s good to note that canonical URL tags are being picked up and indexed. We count around 35 million of them. But until it becomes more evident exactly how the different search engines are treating the tags we are holding off anything drastic, like always trusting the tag in our canonicalization code. This means unless URLs are canonicalized for other reasons, we still produce separate reports for different URL. But you may see some “canonical tag” links in a few places.
#4 – Large Sites Have More Consistent Link Data
Although we’re still a few updates away from crawling as deeply as we’d like on large sites, this latest index shows considerably more and better data about “important pages” on “important domains.” Some of our users noticed that although we often had a number of pages from large sites, they were frequently not the top-level or most linked-to pages – this fix works to address that. Future indices will multiply this capacity considerably.
#5 – Blogscape’s Data Helping Linkscape Stay Fresher
One of the best features of newer Linkscape indices is their inclusion of fresher link data from the blogosphere and “fresh web” (social media sites like Twitter, Facebook, LinkedIn, web forums and others that push data out via feeds). Linkscape is now sucking down link data from Blogscape’s fresh crawl of the web (updated from 10 million+ feeds every 3 hours) and pushing that out in index updates. Linkscape still has the delay between updates, but the link data produced is now considerably better at showing important links from the fresh web.
Tools to View Link Information
This may be a bit overwhelming, but it’s also very, very cool 🙂 As you probably know, Linkscape data is infiltrating all sorts of tubes on the Internet. Here’s a smattering:
Quirk’s SearchStatus Bar
The good folks over at Quirk.biz have baked mozRank into their SearchStatus Firefox extension:
SEOmoz’s MozBar
If you haven’t yet installed the MozBar, I highly recommend it. I’m also very, very excited for the upgrade coming out a few weeks from today. In fact, I’m so excited, I’m leaking a spliced up screenshot (because the 800 pixel wide bar won’t fit in this 600 pixel wide post):
As we noted above, knowing the number of linking root domains is critical to SEO link analysis, so we’re packing it into the new release. That “analyze page” button is going to be seriously awesome, too. Sadly, as I mentioned in my previous post about SEO operators, we’ve been asked by Google to remove PageRank from our toolbar, but there are lots of other third-party extensions that can provide it, like the above SearchStatus bar.
The Free Linkscape API
Our free API serves millions of requests every month, spreading link data far and wide. If you have an application, an internal tool, or hate manually importing data (like I do), check out the API and Nick’s post on the subject.
Top Pages on a Domain
One of my very favorite tools on the web for SEO (and Richard Baxter’s too!), Top Pages lets you enter any domain or subdomain and see the pages on it that have received the largest number of links from unique root domains. The signal to noise ratio is fantastic and it’s remarkably useful for both internal analysis (Do I have opportunities I’m not executing on? Where do I have some spare link juice? What pages might perform best for given keywords?) and competitive information (What is my competition doing that’s bringing them links?).
Smashing Magazine has done some serious Linkbait!
Backlink Anchor Text Analysis Tool
When you need to see anchor text distribution across thousands of links in a few seconds, there’s nothing else like the Backlink Anchor Text Analysis tool. Upgraded this Spring to show Linkscape data, it features sub-30-second runtimes and phenomenal comprehensiveness.
Poor Dave… His friends aren’t using good keywords to link to him. Here you go, buddy – UK SEO
If you’d like even more functionality (particularly the ability to choose a subdomain, root domain or individual URL), the labs version of this tool is also quite excellent.
Linkscape Data Visualization Tool
The most recent addition to the Labs family, Nick’s amazing visualizer tool helps show exactly where strengths and weaknesses exist by comparing many of the data points Linkscape calculates on a scale using Ben’s preliminary rank modeling:
Everybody loves a good radar chart
Basic Linkscape Reports
The classic Linkscape reports still provide a great depth of data and metrics, but you need to know where to look (we obviously have some usability work to do). The juiciest stuff is in the “data detail” tab:
Wow… Twitter gets a LOT of links
Advanced Linkscape Reports
For digging deep into the links that point to a page/site and the associated metrics, advanced reports are still the best source of access.
I’ve got more to write about Oyster.com in the near future (and not just because their namesake is delectable)
The Future of Link Data
There’s clearly been a lot of exciting progress made, but it doesn’t hold a candle to what’s possible. Marketers need data – and SEOmoz’s obligation (and mission) is to answer that call. What’s been done to date hasn’t been easy, and what lies ahead is even harder; particularly making many pieces of incredibly complex information simple and actionabel, but if we wanted easy, crawling the web and building query-independent search ranking metrics probably wasn’t the way to go 🙂
Some of the biggest things we’re thinking about for the future include:
- Crawling deeper and producing more frequent index updates
- Showing historical link information (this one is especially challenging because of index and web size fluctuations)
- Illustrating more about internal link architectures on a site and providing recommendations for improvement
- Building ranking models that predict actions that will drive up organic rankings
- Visualizing important data about links, pages, keywords and global metrics
Again, I’ll share a brief taste of what’s ahead (remember, these are just concept wireframes):
The future looks bright indeed.
As always, we rely on the feedback of our members and the SEO community to help us improve the information provided. Please leave any requests or questions in the comments or send them over to [email protected].