Earlier this year,
Danny Sullivan of Third Door Media asked me if SEOmoz could put together some data comparing ranking elements of Google against those of Bing to help illustrate the potential biases SEOs might face when optimizing for the two engines. Today at SMX Advanced in Seattle, I presented the following data, compiled by our own Ben Hendrickson with help from the entire SEOmoz engineering team (particularly Phil & Chas on the Linkscape side). The results I’m sharing match those in the presentation, with a bit more detail added in for those interested.
Rather than include the entire slide deck, I’ve taken the charts, graphs and data directly from the presentation so those of you seeking to convince clients or motivate internal teams can use them in your own presentations. But, before we begin with the data, I’d like to share a few critical notes about this research that shouldn’t be ignored.
Goals of the Correlation Data Research
With this research, we hope to accomplish three big things:
- Add a new source of data to SEOs’ understanding of how Google & Bing rank web pages
- Bring more science to SEO through a repeatable, peer-reviewed dataset
- Provide recommendations based on our own interpretations AND open the data for interpretation by others as well
Further research, including causation analysis through more sophisticated ranking models and possibly more correlation analysis on other factors are certainly part of our goals as well.
Methodology
- We collected 11,351 search results from both Google & Bing via Google AdWords suggest data for the various categories (you can see these keywords yourself via Google’s AdWords tool)
- We looked only at the first page of results (which typically included 10 results, but sometimes contained a higher or lower number). We ignored all no-standard results (meaning universal or vertical results such as video, images, local or “instant answers”)
- The correlations relate to higher/lower positional ranking on page 1 of the search results
- We controlled for search results where all (or none) of the results matched the metric. Thus, for example, if we were looking for correlation with .gov domains and no results in the set included a .gov domain, we didn’t use that SERP for that dataset.
- We’ve used Spearman’s correlation coefficient, as it is the standard (and in our opinion, best choice) for ranked datasets. You can read more about this selection via Ben’s comments here and here.
This is a very similar methodology we used for our recent information on
Google PageRank correlation.
Understanding Correlation Significance
The correlation numbers we show range between -0.2 and 0.35, where a perfect correlation would be 1.0 and no correlation would be 0.0.
The standard error for each result set is also included, but tends to be so low in most cases that displaying it on the bar graph would make it nearly invisible. This is thanks to the large number of results collected – we’ve got very high confidence in the statistical significance of these.
Correlation β Causation
It’s long been held in statistical analysis that even very high correlations do not necessarily mean one data set is the cause of the other. People holding umbrellas don’t cause rain. Ice cream sales don’t cause hot weather.
–
The more I wear suits, the more I speak on panels about SEO. Does it therefore follow that wearing suits gets me onto panels about SEO?
It’s critical to know that the data below, like data from other types of SEO tests, requires careful consideration and analysis. Parsing a bigger correlation as a direct sign that one should do X or Y more would be a fallacy.
Understanding Negative Correlation
In the research below, you’ll see a few data points where the correlation is actually negative, meaning that when we saw the element, it tended to predict lower placement in the results, rather than higher. For example:
The data for URL length shows that longer URLs are negatively correlated with ranking well. This isn’t particularly shocking, and it probably is wise to limit the length of our URLs if we want to perform well in the engines. However, the second data point on .com TLD extensions shouldn’t necessarily suggest that using .com as your top-level domain extension will actually negatively affect your rankings, but merely that all other things being equal, .com domains didn’t perform as well in the dataset we observed as other domain extensions.
As we go through each set below, we’ll try to explain our thinking, but certainly invite you to draw your own conclusions from the data.
As we’ve seen in the past, when more sophisticated ranking models are introduced, using machine learning against the search results, we often find that previously negative correlations turn out to be positive (or neutral) ranking factors.
That’s it! Let’s dive into the data.
Query Matching in the Domain Name
Our interpretation and conclusions:
- Exact match domains appear to continue their powerful level of influence in both search engines, though I think many SEOs will be surprised to see Google actually has a higher correlation with ranking exact match domains higher (when they appear on page 1 of the results) than Bing.
- Hyphenated exact matches certainly appear to be less influential, though they’re more frequent (Google: 271 results contained these vs. Bing: 890)
- Just having keywords in the domain name has substantive positive correlation (Thus, for example, if I wanted to rank for the word “dog,” the domain mydog.com would fit with this correlation point)
Exact Match Domains by TLD Extension
Our interpretation and conclusions:
- If you’re aiming for exact match, a .com extension is the way to go. Others aren’t nearly as well correlated.
- Bing does seem to appreciate non-dot-com exact matches more than Google, though not tremendously (especially in the case of .org)
Keywords in Subdomains
Our interpretation and conclusions:
- Keywords in subdomains aren’t nearly as powerful as in root domain names
- Bing may be rewarding subdomain keyword usage less than they have historically, though the results counts suggest that they do show up on page one much more frequently (Google: 673 vs. Bing: 1,394)
On-Page Keyword Usage
Our interpretation and conclusions:
- The alt attribute of images is interesting – our research last year found this as a peculiarity and it would appear to still be potentially useful in both engines (definitely worth some testing)
- Placing keywords in your URL string has some correlation with rankings on Google, though this is certainly a case where the “copy/paste” of URLs may be biasing this due to the accompanying anchor text benefits
- Note the placement of the “0” axis – some of these are negatively correlated, though not massively. All of the correlations are in a fairly narrow zone here.
- Everyone seems to be optimizing their title tags these days (appeared in Google: 11,115 vs. Bing: 11,143). Differentiating here is hard.
- Overall, simplistic on-page optimization doesn’t appear to be a huge factor.
Link Counts & Link Diversity
Our interpretation and conclusions:
- Links are still likely a major part of the algorithms. These numbers are among the highest we observe with any single metric.
- Bing may be slightly more naive in their usage of link data than Google, but appear to have improved since last year.
- Diversity of link sources remains more important than raw link quantity.
- Correlation numbers this high say good things about Linkscape’s Index – way to go engineering team!
TLD Extensions
Our interpretation and conclusions:
- This data gives us more reason to believe Google’s webspam chief, Matt Cutts, when he says .gov, .info and .edu are not special cased and don’t receive special bonuses or penalties to rankings
- The .org TLD extension is surprising – do these sites earn more links? Do they have less spam? Perhaps they tend to be less commercial and have an easier time garnering references? In any case, we’re happy to be SEOmoz.org!
- Don’t forget about the exact match data from above – .com is still probably a very good thing (at least own it if you’re using a different extension)
Length of Domain, URL & Content
Our interpretation and conclusions:
- Shorter URLs are likely a good best practice (especially on Bing)
- Long domains may not be ideal, but don’t seem awful
- Raw content length seems marginal in correlation, which fits with Matt Cutts’ advice from the Google I/O panel – “Donβt overfill your page with text for the sake of search engines. They donβt need a dissertation to decide to rank it highly; they want what the users want β for your site to be useful and informative.”
Website Homepages
Our interpretation and conclusions:
- Bing has the stereotype of ranking homepages much more so than Google, and this appears to hold true in the correlation results – they’re about double with Google’s propensity/preference for higher rankings on website homepages (note that we included site.com/, site.com/index.*, site.com/default.* and site.com/home.* in these numbers)
Anchor Text Link Matches
Our interpretation and conclusions:
- Many anchor text links from the same domain likely don’t add much value
- Anchor text links from diverse domains, however, are one of our highest correlated metrics
- Bing seems more Google-like than in the past on handling exact match anchor links
Features w/ the Highest Correlation
Our interpretation and conclusions:
- Link attributes as a whole have much higher correlation with rankings than on-page or domain related elements
- Exact match is still a powerful influencer
- Google and Bing are remarkably similar – building two different sites/pages to separately target the two engines would appear to be a waste of energy
- Bing seems to be moving much closer to Google over time; although we didn’t measure all of these results precisely last year, the similarity of the two has dramatically increased (of course, it’s also possible that Google is getting more Bing-like, though this doesn’t fit with our personal experiences)
As with previous studies, I look forward to your analysis, hypotheses and data requests in the comments. Ben & I will both try to dive in to reply as we’re able over the next few days.