December brought us the latest piece of algorithm update fun. Google rolled out an update which was quickly named the Maccabees update and the articles began rolling in (SEJ , SER).
The webmaster complaints began to come in thick and fast, and I began my normal plan of action: to sit back, relax, and laugh at all the people who have built bad links, spun out low-quality content, or picked a business model that Google has a grudge against (hello, affiliates).
Then I checked one of my sites and saw Iβd been hit by it.
Hmm.
Time to check the obvious
I didnβt have access to a lot of sites that were hit by the Maccabees update, but I do have access to a relatively large number of sites, allowing me to try to identify some patterns and work out what was going on. Full disclaimer: This is a relatively large investigation of a single site; it might not generalize out to your own site.
My first point of call was to verify that there werenβt any really obvious issues, the kind which Google hasnβt looked kindly on in the past. This isnβt any sort of official list; it’s more of an internal set of things that I go and check when things go wrong, and badly.
Dodgy links & thin content
I know the site well, so I could rule out dodgy links and serious thin content problems pretty quickly.
(For those of you who’d like some pointers on the kinds of things to check for, follow this link down to the appendix! There’ll be one for each section.)
Index bloat
Index bloat is where a website has managed to accidentally get a large number of non-valuable pages into Google. It can be sign of crawling issues, cannabalization issues, or thin content problems.
Did I call the thin content problem too soon? I did actually have some pretty severe index bloat. The site which had been hit worst by this had the following indexed URLs graph:
However, Iβd actually seen that step function-esque index bloat on a couple other client sites, who hadnβt been hit by this update.
In both cases, weβd spent a reasonable amount of time trying to work out why this had happened and where it was happening, but after a lot of log file analysis and Google site: searches, nothing insightful came out of it.
The best guess we ended up with was that Google had changed how they measured indexed URLs. Perhaps it now includes URLs with a non-200 status until they stop checking them? Perhaps it now includes images and other static files, and wasnβt counting them previously?
I havenβt seen any evidence that itβs related to m. URLs or actual index bloat β I’m interested to hear peopleβs experiences, but in this case I chalked it up as not relevant.
Poor user experience/slow site
Nope, not the case either. Could it be faster or more user-friendly? Absolutely. Most sites can, but Iβd still rate the site as good.
Overbearing ads or monetization?
Nope, no ads at all.
The immediate sanity checklist turned up nothing useful, so where to turn next for clues?
Internet theories
Time to plow through various theories on the Internet:
- The Maccabees update is mobile-first related
- Nope, nothing here; itβs a mobile-friendly responsive site. (Both of these first points are summarized here.)
- E-commerce/affiliate related
- Iβve seen this one batted around as well, but neither applied in this case, as the site was neither.
- Sites targeting keyword permutations
- I saw this one from Barry Schwartz; this is the one which comes closest to applying. The site didnβt have a vast number of combination landing pages (for example, one for every single combination of dress size and color), but it does have a lot of user-generated content.
Nothing conclusive here either; time to look at some more data.
Working through Search Console data
Weβve been storing all our search console data in Googleβs cloud-based data analytics tool BigQuery for some time, which gives me the luxury of immediately being able to pull out a table and see all the keywords which have dropped.
There were a couple keyword permutations/themes which were particularly badly hit, and I started digging into them. One of the joys of having all the data in a table is that you can do things like plot the rank of each page that ranks for a single keyword over time.
And this finally got me something useful.
The yellow line is the page I want to rank and the page which Iβve seen the best user results from (i.e. lower bounce rates, more pages per session, etc.):
Another example: again, the yellow line represents the page that should be ranking correctly.
In all the cases I found, my primary landing page β which had previously ranked consistently β was now being cannabalized by articles Iβd written on the same topic or by user-generated content.
Are you sure itβs a Google update?
You can never be 100% sure, but I havenβt made any changes to this area for several months, so I wouldnβt expect it to be due to recent changes, or delayed changes coming through. The site had recently migrated to HTTPS, but saw no traffic fluctuations around that time.
Currently, I donβt have anything else to attribute this to but the update.
How am I trying to fix this?
The ideal fix would be the one that gets me all my traffic back. But thatβs a little more subjective than βI want the correct page to rank for the correct keyword,β so instead thatβs what Iβm aiming for here.
And of course the crucial word in all this is βtryingβ; Iβve only started making these changes recently, and the jury is still out on if any of it will work.
No-indexing the user generated content
This one seems like a bit of no-brainer. They bring an incredibly small percentage of traffic anyway, which then performs worse than if users land on a proper landing page.
I liked having them indexed because they would occasionally start ranking for some keyword ideas Iβd never have tried by myself, which I could then migrate to the landing pages. But this was a relatively low occurrence and on-balance perhaps not worth doing any more, if Iβm going to suffer cannabalization on my main pages.
Making better use of the Schema.org “About” property
Iβve been waiting a while for a compelling place to give this idea a shot.
Broadly, you can sum it up as using the Schema.org About property pointing back to multiple authoritative sources (like Wikidata, Wikipedia, Dbpedia, etc.) in order to help Google better understand your content.
For example, you might add the following JSON to an article an about Donald Trumpβs inauguration.
[ { "@type": "Person", "name": "President-elect Donald Trump", "sameAs": [ "https://en.wikipedia.org/wikiDonald_Trump", "http://dbpedia.org/page/Donald_Trump", "https://www.wikidata.org/wiki/Q22686" ] }, { "@type": "Thing", "name": "US", "sameAs": [ "https://en.wikipedia.org/wiki/United_States", "http://dbpedia.org/page/United_States", "https://www.wikidata.org/wiki/Q30" ] }, { "@type": "Thing", "name": "Inauguration Day", "sameAs": [ "https://en.wikipedia.org/wiki/United_States_presidential_inauguration", "http://dbpedia.org/page/United_States_presidential_inauguration", "https://www.wikidata.org/wiki/Q263233" ] } ]
The articles Iβve been having rank are often specific sub-articles about the larger topic, perhaps explicitly explaining them, which might help Google find better places to use them.
You should absolutely go and read this article/presentation by Jarno Van Driel, which is where I took this idea from.
Combining informational and transactional intents
Not quite sure how I feel about this one. Iβve seen a lot of it, usually where there exist two terms, one more transactional and one more informational. A site will put a large guide on the transactional page (often a category page) and then attempt to grab both at once.
This is where the lines started to blur. I had previously been on the side of having two pages, one to target the transactional and another to target the informational.
Currently beginning to consider whether or not this is the correct way to do it. Iβll probably try this again in a couple places and see how it plays out.
Final thoughts
I only got any insight into this problem because of storing Search Console data. I would absolutely recommend storing your Search Console data, so you can do this kind of investigation in the future. Currently Iβd recommend paginating the API to get this data; itβs not perfect, but avoids many other difficulties. You can find a script to do that here (a fork of the previous Search Console script Iβve talked about) which I then use to dump into BigQuery. You should also check out Paul Shapiro and JR Oakes, who have both provided solutions that go a step further and also do the database saving.
My best guess at the moment for the Maccabees update is there has been some sort of weighting change which now values relevancy more highly and tests more pages which are possibly topically relevant. These new tested pages were notably less strong and seemed to perform as you would expect (less well), which seems to have led to my traffic drop.
Of course, this analysis is currently based off of a single site, so that conclusion might only apply to my site or not at all if there are multiple effects happening and Iβm only seeing one of them.
Has anyone seen anything similar or done any deep diving into where this has happened on their site?
Appendix
Spotting thin content & dodgy links
For those of you who are looking at new sites, there are some quick ways to dig into this.
For dodgy links:
- Take a look at something like Searchmetrics/SEMRush and see if theyβve had any previous penguin drops.
- Take a look into tools Majestic and Ahrefs. You can often get this free, Majestic will give you all the links for your domain for example if you verify.
For spotting thin content:
- Run a crawl
- Take a look at anything with a short word count; letβs arbitrarily say less than 400 words.
- Look for heavy repetition in titles or meta descriptions.
- Use the tree view (that you can find on Screaming Frog, for example) and drill down into where it has found everything. This will quickly let you see if there are pages where you donβt expect there to be any.
- See if the number of URLs found is notably different to the indexed URL report.
- Soon you will be able to take a look at Googleβs new index coverage report. (AJ Kohn has a nice writeup here).
- Browse around with an SEO chrome plugin that will show indexation. (SEO Meta in 1 Click is helpful, I wrote Traffic Light SEO for this, doesnβt really matter what you use though.)
Index bloat
The only real place to spot index bloat is the indexed URLs report in Search Console. Debugging it however is hard, I would recommend a combination of log files, βsite:β searches in Google, and sitemaps when attempting to diagnose this.
If you can get them, the log files will usually be the most insightful.
Poor user experience/slow site
This is a hard one to judge. Virtually every site has things you can class as a poor user experience.
If you donβt have access to any user research on the brand, I will go off my gut combined with a quick scan to compare to some competitors. Iβm not looking for a perfect experience or anywhere close, I just want to not hate trying to use the website on the main templates which are exposed to search.
For speed, I tend to use WebPageTest as a super general rule of thumb. If the site loads below 3 seconds, Iβm not worried; 3β6 Iβm a little bit more nervous; anything over that, Iβd take as being pretty bad.
I realize thatβs not the most specific section and a lot of these checks do come from experience above everything else.
Overbearing ads or monetization?
Speaking of poor user experience, the most obvious one is to switch off whatever ad-block youβre running (or if itβs built into your browser, to switch to one without that feature) and try to use the site without it. For many sites, it will be clear cut. When itβs not, Iβll go off and seek other specific examples.