seo

How to Use Robots.txt and Redirects the Wrong Way

I got inspired by Rebecca Kelley’s post about Newbie Mistakes, and instantly two and a half newbie mistakes came to my mind. They are a bit on the technical side of things, but not too much to be understood by all of you mozzers πŸ™‚

1. Robots.txt is no security layer

As we all know, clever webmasters provide a robots.txt to prevent some selected content of their site to be crawled. But one should always be aware of the fact that the robots.txt is no more than a recommendation to the search engine not to crawl the pages.

Thankfully, the popular search engines take the recommendation and don’t crawl them. But there are exceptions: nasty, evil and unlikeable search engines, and – of course – curious people like me. I recently came across this robots.txt of a Spanish website:

weird robot.txt

I immediately thought:”Estadisticas? Statistics? Wtf?” and typed the URL into my browser – and voilΓ , I saw a neat AWstats-interface providing all website statistics.

Well that’s bad, as that may reveal data you won’t share, but it got even worse when I got to the second page, “actualizador.php.” Just by accessing the page, I accidentally started a huge database update script, slowing down the whole website for at least a minute. That’s where it gets really bad.

Conclusion #1:

Do create a robots.txt, but fill it only (only!) with URLs someone – including you – might be linking to, and don’t start any huge tasks or reveal confidential data. If you can’t place those pages in a secure section of your page, for whatever reason, be sure not to mention them in the robots.txt. Keep them in your pocket instead (or, better yet, your head!).

2. Redirecting done wrong

When it comes to redirecting by means of HTTP status codes, I’m sure you know that 301 is your friend, because only a 301 will pass PageRank to the page you’re redirecting to. However, some sites just don’t get it. Apart from using 302 instead of 301 some sites also use way too many redirects. The worst I’ve seen recently is www.websingles.at, a, by Austrian standards, big website. Type in the URL and you’ll get the following redirects:

  • 302 from http://www.websingles.at to http://www2.websingles.at
  • 302 from http://www2.websingles.at to http://www2.websingles.at/pages/site/de
  • 301 from http://www2.websingles.at/pages/site/de to http://www2.websingles.at/pages/site/de/ (note the trailing slash)

This results in their start page not having pagerank at all (http://www.websingles.at has a pagerank of 4, by the way) and, on the technical side, three redirects for accessing the start page. The way to go would be:

  • 301 from http://www.websingles.at to http://www2.websingles.at/pages/site/de/

Basta!

Conclusion #2:

Of course, there are many situations to use redirects, but my personal rule of thumb is to use a 301 for:

  1. Moved pages
  2. Mistyped URLs
  3. Forwarding to localized startpages

Watch out if you notice:

  1. Redirecting when clicking on internal links. You know your URL-structure, just use the right URL!
  2. An important site of yours not having pagerank — there might be a 302 involved

Hope you enjoyed my findings – and of course, feel free to post your thoughts in the comments!

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button