seo

Big Data, Big Problems: 4 Major Link Indexes Compared

Ali JalilPour June 27, 2024

0 0 7 minutes read

Given this blog’s readership, chances are good you will spend some time this week looking at backlinks in one of the growing number of link data tools. We know backlinks continue to be one of, if not the most important
parts of Google’s ranking algorithm. We tend to take these link data sets at face value, though, in part because they are all we have. But when your rankings are on the line, is there a better way to get at which data set is the best? How should we go
about assessing these different link indexes like
Moz,
Majestic, Ahrefs and SEMrush for quality? Historically, there have been 4 common approaches to this question of index quality…

Breadth: We might choose to look at the number of linking root domains any given service reports. We know
that referring domains correlates strongly with search rankings, so it makes sense to judge a link index by how many unique domains it has
discovered and indexed.
Depth: We also might choose to look at how deep the web has been crawled, looking more at the total number of URLs
in the index, rather than the diversity of referring domains.
Link Overlap: A more sophisticated approach might count the number of links an index has in common with Google Webmaster
Tools.
Freshness: Finally, we might choose to look at the freshness of the index. What percentage of links in the index are
still live?

There are a number of really good studies (some newer than others) using these techniques that are worth checking out when you get a chance:

BuiltVisible analysis of Moz, Majestic, GWT, Ahrefs and Search Metrics
SEOBook comparison of Moz, Majestic, Ahrefs, and Ayima
MatthewWoodward
study of Ahrefs, Majestic, Moz, Raven and SEO Spyglass
Marketing Signals analysis of Moz, Majestic, Ahrefs, and GWT
RankAbove comparison of Moz, Majestic, Ahrefs and Link Research Tools
StoneTemple study of Moz and Majestic

While these are all excellent at addressing the methodologies above, there is a particular limitation with all of them. They miss one of the
most important metrics we need to determine the value of a link index: proportional representation to Google’s link graph
. So here at Angular Marketing, we decided to take a closer look.

Table of Contents

Proportional representation to Google Search Console data

So, why is it important to determine proportional representation? Many of the most important and valued metrics we use are built on proportional
models. PageRank, MozRank, CitationFlow and Ahrefs Rank are proportional in nature. The score of any one URL in the data set is relative to the
other URLs in the data set. If the data set is biased, the results are biased.

A Visualization

Link graphs are biased by their crawl prioritization. Because there is no full representation of the Internet, every link graph, even Google’s,
is a biased sample of the web. Imagine for a second that the picture below is of the web. Each dot represents a page on the Internet,
and the dots surrounded by green represent a fictitious index by Google of certain sections of the web.

Of course, Google isn’t the only organization that crawls the web. Other organizations like Moz,
Majestic, Ahrefs, and SEMrush
have their own crawl prioritizations which result in different link indexes.

In the example above, you can see different link providers trying to index the web like Google. Link data provider 1 (purple) does a good job
of building a model that is similar to Google. It isn’t very big, but it is proportional. Link data provider 2 (blue) has a much larger index,
and likely has more links in common with Google that link data provider 1, but it is highly disproportional. So, how would we go about measuring
this proportionality? And which data set is the most proportional to Google?

Methodology

The first step is to determine a measurement of relativity for analysis. Google doesn’t give us very much information about their link graph.
All we have is what is in Google Search Console. The best source we can use is referring domain counts. In particular, we want to look at
what we call
referring domain link pairs. A referring domain link pair would be something like ask.com->mlb.com: 9,444 which means
that ask.com links to mlb.com 9,444 times.

Steps

Determine the root linking domain pairs and values to 100+ sites in Google Search Console
Determine the same for Ahrefs, Moz, Majestic Fresh, Majestic Historic, SEMrush
Compare the referring domain link pairs of each data set to Google, assuming a
Poisson Distribution
Run simulations of each data set’s performance against each other (ie: Moz vs Maj, Ahrefs vs SEMrush, Moz vs SEMrush, et al.)
Analyze the results

Results

When placed head-to-head, there seem to be some clear winners at first glance. In head-to-head, Moz edges out Ahrefs, but across the board, Moz and Ahrefs fare quite evenly. Moz, Ahrefs and SEMrush seem to be far better than Majestic Fresh and Majestic Historic. Is that really the case? And why?

It turns out there is an inversely proportional relationship between index size and proportional relevancy. This might seem counterintuitive,
shouldn’t the bigger indexes be closer to Google? Not Exactly.

What does this mean?

Each organization has to create a crawl prioritization strategy. When you discover millions of links, you have to prioritize which ones you
might crawl next. Google has a crawl prioritization, so does Moz, Majestic, Ahrefs and SEMrush. There are lots of different things you might
choose to prioritize…

You might prioritize link discovery. If you want to build a very large index, you could prioritize crawling pages on sites that
have historically provided new links.
You might prioritize content uniqueness. If you want to build a search engine, you might prioritize finding pages that are unlike
any you have seen before. You could choose to crawl domains that historically provide unique data and little duplicate content.
You might prioritize content freshness. If you want to keep your search engine recent, you might prioritize crawling pages that
change frequently.
You might prioritize content value, crawling the most important URLs first based on the number of inbound links to that page.

Chances are, an organization’s crawl priority will blend some of these features, but it’s difficult to design one exactly like Google. Imagine
for a moment that instead of crawling the web, you want to climb a tree. You have to come up with a tree climbing strategy.

You decide to climb the longest branch you see at each intersection.
One friend of yours decides to climb the first new branch he reaches, regardless of how long it is.
Your other friend decides to climb the first new branch she reaches only if she sees another branch coming off of it.

Despite having different climb strategies, everyone chooses the same first branch, and everyone chooses the same second branch. There are only
so many different options early on.

But as the climbers go further and further along, their choices eventually produce differing results. This is exactly the same for web crawlers
like Google, Moz, Majestic, Ahrefs and SEMrush. The bigger the crawl, the more the crawl prioritization will cause disparities. This is not a
deficiency; this is just the nature of the beast. However, we aren’t completely lost. Once we know how index size is related to disparity, we
can make some inferences about how similar a crawl priority may be to Google.

Unfortunately, we have to be careful in our conclusions. We only have a few data points with which to work, so it is very difficult to be
certain regarding this part of the analysis. In particular, it seems strange that Majestic would get better relative to its index size as it grows,
unless Google holds on to old data (which might be an important discovery in and of itself). It is most likely that at this point we can’t make
this level of conclusion.

So what do we do?

Let’s say you have a list of domains or URLs for which you would like to know their relative values. Your process might look something like
this…

Check Open Site Explorer to see if all URLs are in their index. If so, you are looking metrics most likely to be proportional to Google’s link graph.
If any of the links do not occur in the index, move to Ahrefs and use their Ahrefs ranking if all you need is a single PageRank-like metric.
If any of the links are missing from Ahrefs’s index, or you need something related to trust, move on to Majestic Fresh.
Finally, use Majestic Historic for (by leaps and bounds) the largest coverage available.

It is important to point out that the likelihood that all the URLs you want to check are in a single index increases as the accuracy of the metric
decreases. Considering the size of Majestic’s data, you can’t ignore them because you are less likely to get null value answers from their data than
the others. If anything rings true, it is that once again it makes sense to get data
from as many sources as possible. You won’t
get the most proportional data without Moz, the broadest data without Majestic, or everything in-between without Ahrefs.

What about SEMrush? They are making progress, but they don’t publish any relative statistics that would be useful in this particular
case. Maybe we can hope to see more from them soon given their already promising index!

Recommendations for the link graphing industry

All we hear about these days is big data; we almost never hear about good data. I know that the teams at Moz,
Majestic, Ahrefs, SEMrush and others are interested in mimicking Google, but I would love to see some organization stand up against the
allure of
more data in favor of better data—data more like Google’s. It could begin with testing various crawl strategies to see if they produce
a result more similar to that of data shared in Google Search Console. Having the most Google-like data is certainly a crown worth winning.

Credits

Thanks to Diana Carter at Angular for assistance with data acquisition and Andrew Cron with statistical analysis. Thanks also to the representatives from Moz, Majestic, Ahrefs, and SEMrush for answering questions about their indices.

Ali JalilPour June 27, 2024

0 0 7 minutes read

Learn from a Rankings Dominator – NYMag.com

Washington State Sues SEO Company Visible.net

8 Ways to Delight the Pants Off Your Community

Page Authority 2.0: An Update on Testing and Timing

Owl.li Is Ranking Above My Own Page! What To Do? Whooo To Blame?

Proximity to Searcher is the New #1 Local Search Ranking Factor

Your Chance to Win a FREE Conference Pass to Pay Per Click Summit

Exactly How Powerful Are Tweets & Retweets? Help Us Find Out!

Google’s Ajax APIs

Google is Losing Hundreds of Thousands of Advertising Dollars in Canada

4 Conversations That Don’t Involve Rank Reports

Grabbing Your Traffic by the Long Tail and Other Reasons to Take a Deeper Look at Your Analytics

Big Data, Big Problems: 4 Major Link Indexes Compared

Proportional representation to Google Search Console data

A Visualization

Methodology

Steps

Results

What does this mean?

So what do we do?

Recommendations for the link graphing industry

Credits

Ali JalilPour

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

Headsmacking Tip #2: Top Level Navigation Naming Conventions

Everyone can Linkbait

How to Use Keyword Explorer to Identify Competitive Keyword Opportunities

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

How I Develop Successful Link Building Strategies for My Clients

Top SEO Tips for 2024 — Whiteboard Friday

Intro to Python [Part 2]

Proportional representation to Google Search Console data

A Visualization

Methodology

Steps

Results

What does this mean?

So what do we do?

Recommendations for the link graphing industry

Credits

Subscribe to our mailing list to get the new updates!

5 Actionable Analytics Reports for Internal Site Search

Help Us Improve the Moz Blog: 2015 Reader Survey

Related Articles

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

Headsmacking Tip #2: Top Level Navigation Naming Conventions

Everyone can Linkbait

How to Use Keyword Explorer to Identify Competitive Keyword Opportunities

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

How I Develop Successful Link Building Strategies for My Clients

Top SEO Tips for 2024 — Whiteboard Friday

Intro to Python [Part 2]