This past week during the SMX Advanced conference in Seattle, I presented some correlation data alongside Janet Driscoll-Miller, Sasi Parthasarathy of Bing & Matt Cutts of Google. Matt in particular was quite vocal in expressing a desire to see additional data points from our research, primarily around the prominence/visibility of particular elements in the results. This post is intended to help make that available.
I must say that I don’t agree with Matt on the importance of the raw visibility/counts over the ranking correlations. My feeling is that SEOs in these spaces are more interested in answering the question – “what features predict a result will rank higher vs. lower on page 1?” – rather than the more straightforward – “does this feature appear more frequently on page 1 at Google or Bing?” However, I certainly agree that both are relevant and interesting.
If you’re trying to wrap your head around how to understand this prominence/visiblity data vs. our earlier data on the correlation with rankings, here’s how we’d best describe it:
- Correlation w/ rankings data helps to answer the question, “when this feature appears in results on the first page of Google/Bing, who ranks it higher and by what amount?” Those correlation numbers were derived by looking at the liklihood that a result would rank above another when it contained the target attribute.
- Visibility/prominence of an element helps to answer the question, “is this element more likely to appears on the first page of Google’s/Bing’s results?” This simply looks at the number of times we saw a result (or multiple results) ranking on page 1 containing the target attribute.
We’re looking at the latter one in this post, but before we dive in, there are a few critical items to understand:
- This isn’t correlation data and there’s no standard error or deviation numbers here. It’s simply how many times we saw the element in the results we gathered, divided by the total number of results (SERPs or URLs depending on the chart) to get a percentage.
- This data is from page 1 of results from 11,351 search results, gathered from Google’s AdWords categories. This means the terms and phrases vary somewhat in search quantity (from sub-100 searches per month to tens or hundreds of thousands) but generally have a commercial focus and a intent. They generally don’t include brand names, long tail phrases or vanityname searches. Overall, we picked them because they’re precisely the kinds of queries most SEOs care about when they’re doing competitive SEO for their companies and clients. We also ignore the second result in a SERP from the same domain to avoid effects of indented results (which was important for our earlier statistics, but not those in this post).
- The results were collected the week of May 31st and thus, include post-“Mayday” update SERPs and likely results from after the “caffeine” launch as well (though Google did not announce when exactly that rollout occurred – it may not have much bearing as caffeine supposedly is an infrastructure, rather than an algorithmic change).
- Each feature contains two pie charts, one showing the percentage of results that contained at least 1 URL with this feature and another showing the percentage of total URLs in all results (102,296 for Google and 109,966 for Bing – note that some SERPs will fluctuate the quantity of standard web results they show on page 1). These are labeled as “(feature) in SERPs” and “(feature) in URLs,” respectively.
In gathering this data, we did not optimize to share it in this fashion. In fact, Ben & I both feel that if we wanted to do it this way, we should gather the first 3-5 pages of results, not just the 1st page. The way, one could compare the counts on page 1 with the counts on page 2. However, since we’ve got the data and Matt, Sasi and several other folks expressed interest, we’re sharing anyway. Hopefully in the future we can do more on this front.
Let’s dive in!
Exact Match Domains
These are domains that precisely matched the keywords in the query – e.g. for the query “dog collars” only a domain that matched *.dogcollars.* would be included.
You can see that Bing has slightly more exact match domains appearing in at least one result of the SERPs we collected and in the overall count of results (all the URLs from all the SERPs).
Exact Match .com Domains
Similar to exact match domains, exact match .com domains had to contain the exact query in the domain name and have a .com TLD extension.
Again, Bing showed a slight preference for displaying results from these sites in the SERPs and URLs we observed.
Exact Match .net Domains
As above, but replace “.com” with “.net.”
The similarity is much closer in the number of total URLs we saw with .net exact match, but Bing is showing a preference in the SERPs count.
Exact Match .org Domains
In the .org TLDs, we start to see a bit of what we observed in the ranking correlation data:
This is the first exact match domain TLD where Google actually had more SERPs containing a result of this type. Bing, however, had a very tiny amount more URLs with this feature.
Exact Hyphenated Match Domains
One of Matt Cutts’ complaints centered around how Google vs. Bing handled exact hyphenated match domains. When we observed them in ranking correlations, it appeared that, when Google listed them, they would rank them higher than Bing did when they appeared on that first page of results. However…
As I called out in the presentation and the prior post, Bing has quite a few more SERPs where exact match domains appear and somewhat more URLs, too. This is another data point that should make us all think carefully about the fallacy of presuming correlation = causation. Bing might have a preference for exact hyphenated match domains, but the ranking correlations suggest to me there’s more going on here – maybe something to do with anchor text or where those types of sites tend to get links or something else we haven’t considered?
It’s critical to keep in mind that we’re just looking at individual factors here – not trying to explain why they exist or correlate (at least, not in the data).
Results that Include All Keywords in the Domain Name
Here we looked for domains that contained the keyword query in the domain, even if the match wasn’t exact. For example, mydogcollar.com would now match for the phrase “dog collar.”
Again, it’s Bing that shows a higher number of these types of domains in their results.
Results that Include All Keywords in the Subdomain Name
We’ve previously shown some data suggesting that subdomains might have some ranking influence, but not as much as root domains (this was done using our rank modeling / machine learning process). Here’s some raw data on the number of times we observed keyword matching subdomains:
Perhaps not surprisingly, Bing again is showing more of these results in their SERPs and individual URLs.
.com Domains
For this feature and all the TLDs below, we’re just looking at any URL that has the domain extension.
It looks like Bing has very slightly more .coms in their results vs. Google.
.org Domains
Let’s see what happens for .org domains, recalling Google’s apparent preference for them in the ranking correlations.
Oddly, Bing again seems to have more .org pages in the SERPs and URLs.
.net Domains
URLs with .net probably won’t surprise you much:
Yet again, Bing is showing a small number more than their Googly competitors.
.edu Domains
Recall how, in the correlation data, the numbers were small(ish) but negatively correlated? Let’s see what the number of results shows:
True to the stereotype, Google is slightly ahead on number of .edu domains in the SERPs & URLs.
.gov Domains
Given the previous charts, this one likely won’t surprise you:
Google has more .edus and more .govs, too.
Keywords in the Title Element
Not surprisingly, nearly every set of SERPs had at least one result where the title tag contained the keywords:
Bing shows up with more results that contain title tag to keyword matching. One thing that is worth mentioning is that we didn’t observe the titles the engines chose to show, but rather the page titles from the results themselves. Hence, if a result was showing a DMOZ title or a brand title (which Goole will sometimes insert), we ignored those and just saw the title element on the page itself.
Keywords in the URL
This one actually surprised me, if only because there were even fewer results with keywords in the URL than in the title!
Bing again has more results with keyword-matching URLs, though remember that some of that is probably from keyword matching domains, too.
Keywords in the H1
The ranking correlations suggested that the H1 tag isn’t much of a differentiator, yet lots of people still swear by them:
The results would bear out that this is a much less frequent item than URLs or Titles for those ranking on page 1. Bing seems to show more of them than Google, though.
Keywords in the Alt Attribute
Alt attributes looked interesting last fall when we collected ranking information and once again provde worth a look in the correlation data from SMX Advanced. Let’s see what the raw couts show:
Bing is showing slightly more of these, but if the positive correlation means something, these numbers certanly suggest there’s lots of opportunity left for good alt attribute practices.
Homepages
Who lists homepages vs. deep pages in the results more?
My word! It’s Google by a good margin. Bing’s show of internal pages actually surprises me a bit, though perhaps that’s an old stereotype I need to abolish.
And with that, we’re done!
One important point to notice is that I’ve not included data on link results, as these would be hard to interpret and likely non-useful. Every page of results had pages with links to them and nearly every individual ranking URL also had links (a good sign for Linkscape’s index, but not super valuable as a data point). There were a few other data pieces like this that wouldn’t make sense here (keyword prominence in the body tag, word tokens in the body tag, domain name length, etc) and have thus been excluded.
I’ve done less analysis on these results in general, as I think the data is a bit less ideal for the purpose, but it’s still interesting and hopefully, illustrative of general prominence. I look forward to seeing your interpretations and discussion!
p.s. If you email Ben at SEOmoz dot org, he will send you a lot of numbers in a TSV which is for each query the metrics for each result that we used in these posts. You can also find raw results in a public Google spreadsheet doc here. Feel free to play around and let us know if you see anything else cool and interesting.