seo

The Second February Mozscape Index is Live!

Ali JalilPour June 27, 2024

0 0 2 minutes read

We’re continuing the trend of two index releases each month by bringing you the latest Mozscape index release today – only 15 days after our last release on February 12th! The latest Mozscape index took about 11 days to process, with a fairly significant portion crawled the beginning of February. The crawl data spans about 38 days, so the oldest crawl data will date back to the beginning of January. You can access refreshed data across all of our applications – Open Site Explorer, the Mozbar, PRO campaigns, and the Mozscape API.

Our Big Data processing team (Martin York, Douglas Vojir, and Stephen Wood) have been working on some really exciting improvements to our processing code base reducing the length of time processing takes, as well as beginning development on a highly anticipated new Mozscape index feature:

The Mozscape index is created in one continuous batch processing pipeline. A massive amount of crawl data is initially downloaded which is first sorted and organized, then the computations and magic are applied. Every so often, files get uploaded in a checkpoint step; just in case something catastrophic happens to the index, we’ll be able to roll back to a fairly recent step.

Recently the Big Data processing team dug through this checkpointing code to see where they could optimize – and they really optimized! The time needed to checkpoint files varies throughout the pipeline, but the longest checkpointing step used to take about 60 hours to complete… With the optimization from Doug and Martin, this step now takes on average 2.18 hours! Holy time savings!!
The first few steps in processing are dedicated to organizing how the work is going to be distributed across the entire Mozscape processing cluster. These files are broken out into what are called shards and then assigned across the entire fleet of machines. Sometimes these shards aren’t always completely full; this means one machine will be all done with work before another machine. Martin revisited this code as well to see what type of optimization could be applied. With the help of our master data scientist, Matt Peters, Martin was able to improve the distribution of work, saving around 25% of time spent processing!
One feature we hear requested fairly often is including HTTPS crawl data in the Mozscape index. Good news – development on this feature has begun, and we hope to have HTTPS data included in the Mozscape index this summer!

Here are the metrics for this latest index:

82,275,594,589 (82 billion) URLs
9,097,532,641 (9.1 billion) Subdomains
148,991,416 (149 million) Root Domains
829,267,740,331 (829 billion) Links
Followed vs. Nofollowed
- 2.25% of all links found were nofollowed
- 56.08% of nofollowed links are internal
- 43.92% are external
Rel Canonical – 15.43% of all pages now employ a rel=canonical tag
The average page has 73 links on it
- 62.93 internal links on average
- 10.33 external links on average

And the following correlations with Google’s US search results:

Page Authority – 0.35
Domain Authority – 0.19
MozRank – 0.24
Linking Root Domains – 0.31
Total Links – 0.25
External Links – 0.29

Crawl histogram for the February 27th Mozscape index

As you can see from the metrics above, there continues to be an increase of subdomains as we have discovered a small number of root domains that have a substantial number of subdomains associated with them.

We always love to hear your thoughts! And remember, if you’re ever curious about when Mozscape next updates, you can check the calendar here. We also maintain a list of previous index updates with metrics here.

Ali JalilPour June 27, 2024

0 0 2 minutes read

Whiteboard Friday – “Whose House? Moz House!”

Whiteboard Tuesday – SMX West Interviews: Jeremy Schoemaker on Monetizing Your Blog

Comparing SEO & Social Media as Marketing Channels

The Five Marketing Lessons That Took Me a Long Time to Learn

Study: How Searchers Perceive Country Code Top-Level Domains

Tracking SEO ‘Broad Match’ Keywords in Google Analytics

Best of SearchLove London

11 Conversion Rate Optimization Lessons Learned in 2009 (and annual moz traffic stats)

How to Create Presentations Like Rand

Beauty in URLs

Who Has the Best Link Building Techniques?

Affiliate Summit Las Vegas: SEOmoz Launches Affiliate Program with Pepperjam

The Second February Mozscape Index is Live!

Ali JalilPour

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

UGC Gets an A+ on Google Test with Panda Update

Google My Business: FAQ for Multiple Businesses at the Same Address

Get a Linklove Video for FREE

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

SMX Advanced Recap (The R. Kelley Version)

Intro to Python [Part 2]

Exalead’s Advanced Query Operators

Subscribe to our mailing list to get the new updates!

Social Media Curation Guide

5 Steps to use Competitive Research as a Foundation for a Successful Content Marketing Strategy

Related Articles