I just got a PRO account with SEOmoz and saw the canonicalization video. It was impressive except the https part of it. The video said that for https you should try to redirect to http in case of bots and otherwise show the https page, as https pages are meant for users, not for bots. Yeah, somehow I can agree but I still won’t recommend doing such a work (alone). I have never recommended bot-specific programming. I faced a similar problem a few years back. Let’s see how to handle https for canonicalization as well as duplicate content.
Let’s take the duplicate content first:
As Rand explained, https pages are for humans (especially for security, as the communication between browser and server becomes encrypted), so let’s block the https website for bots using robots.txt and meta tags. Here is how you do it with .htaccess (there are many other ways, too):- Create a file called robots_ssl.txt in your document root.
- Add this to your .htaccessRewriteCond %{SERVER_PORT} 443 [NC]RewriteRule ^robots.txt$ robots_ssl.txt [L]
- Remove domain.com:443 from your webmaster console if the pages are already crawled.
- If you are using dynamic pages like php, try< ?phpif ($_SERVER["SERVER_PORT"] == 443){echo "< meta name=" robots " content=" noindex,nofollow " > “;}?>
This will ensure that https doesn’t exists for search engines, and it will take care of duplicate content.
Now let’s take canonicalization part of it. This is a little tricky. Let me put the important points in random format:
- Make sure that your important content pages are available for http. (Whether to have https or not is a tough call — it is beyond the scope of this article, as I will have to explain the problems in shifting visitors from http to https while browsing.) Generally I recommend that only transaction sections (which are generally less content and less important from search engine point of view) are kept under https; rest sections can be made available under http.
- Here the only concern is links. We need to make sure people link to http pages, not https. Have a different log file for https domain, which is very much possible, and do a small amount of programming to point the referral links to your email every day or week (you can check manually). Contact the webmasters and ask them to change the links to http with a nice mail (this worked for me many times). Also make sure that normal sections are mainly seen under http only so that chances of people linking to https can be reduced.
- Have a little patience with Google algos. I am sure Google will soon (if not already) count https links for http pages as well, so once a while if the links are going to https, it can act as a positive (natural) signal.
- I think having an https (with a certificate from verisign or any other firm) sends (will send) a positive signal that you are serious about user data.
There is more to write about how to use https effectively for better user experiences while taking care of dumb search engines.
Some related resources: