“I’m going to register one hundred domains and link them all to my main website.”
This notion hits most people in their early stages of discovering how search engine algorithms work. Many people give in to temptation and test it out, and some have even built elaborate schemes that appear to work for a while.
But now the bad news. At Dejan SEO we have analyzed large quantities of link data collected between 2005 and 2011 to observe the results of this approach… and the results are not pretty. Trying to artificially influence Google’s search algorithm via the manipulation of link graph and its signals, is doomed to failure in a fiery ball of penalties, wasted time and depleted funds.
Humans are inherently lazy. When we opt-in for a link scheme, we are clearly after a shortcut. Unfortunately, these very shortcuts leave tangible footprints and common patterns that can be easily detected by an algorithm or through human review.
Content Networks
Let’s start at the beginning – content networks. Content networks are the bamboo of the web: Numerous, fast-growing and consumed by Panda.
Here’s the basic content network approach:
- Register n domains
- Buy cheap hosting with WHM
- Batch install CMS (typically Word Press)
-
Get Content
- Cheap writing
- Spinning
- RSS Feeds
- Scraping
- Link up sites
- Flow Page Rank
- Sell links / Link your main site / Collect Ad Sense money…etc.
The cost of the setup is quite low when implemented on an unsophisticated level. This is why we have seen so many low quality sites in Google’s results in the last five years. Once you start diversifying IP addresses, sourcing quality content and implementing design variations the running costs of your operation escalate ten-fold. And there are no guarantees that your scheme will succeed in the long-term.
Advanced Content Networks
In 2011, Panda sorted out most of these artificial content network sites. Some are still out there though and are still ranking well. Is this a sign of bamboo farmers getting more sophisticated?
Many content networks have reached a transitional stage where their content websites are becoming more real, and in some cases even useful, which makes them more difficult to discern. Difficult, but not impossible.
Google is determined to kill off bamboo farming and has cranked up its use of a wide range of elements to determine what may be a network of websites or an artificial link scheme designed to manipulate rankings.
If you still want to try your hand at bamboo farming, it is pretty much guaranteed that you will be tripped up by at least one of these signals.
Open Invitations
Many networks have open invitations for link buying, link selling, link exchanges, and paid blogging. They also talk on their pages about Page Rank increases. For spam cops, this is a no-brainer and is an easy way to flag an artificial content network.
Domain-Level Information
Google is a domain name registrar and has access to significant amounts of data about any domain. As a result, Google can compare domain registration data including TLD consistency, domain naming patterns, ownership, and contact details.
Other data they can track includes registration length and domain history including registrant changes and domain name’s content/topical changes over time.
And buying and using expired domains for your sites won’t help. Expired domains are a common occurrence and Google can determine whether new ownership brings with it a new site, or the previous owner restoring the same website on a re-registered domain.
And don’t think that private registration for all your domains is a solution as it provides another pattern, which in combination with other signals, can flag your scheme.
Hosting-Level Information
Apart from the obviously similar hosting characteristics a content network may have (e.g. hosting company, IP address, server type and geo location); Google can also compare name server information and C-blocks. This means that bamboo farmers must diversify name server information and rotate their IP address range to a higher level, which is costly and time-consuming to run.
Content Characteristics
Google’s index is sophisticated and has the capacity to track changes over time. This means that websites are observed for historical content changes and frequency of updates.
Natural websites grow gradually and continue to grow over time. Artificial websites tend to generate content quickly and then slow down (unless automated content schemes are used). Since the Panda updates, content duplication, spinning and automation is being aggressively filtered.
Other content giveaways come from simple things. Google contains a large database of addresses and other business/organisation contact information through Google Places, Maps and related services, and compares this data with information on suspect sites. Blog networks typically do not have useful “About us” pages, contact information, staff profiles, contact details, phone numbers or location maps.
Another signal that may be used to detect fakes is by observing topical consistency and diversity of content. Qualitative analysis can be tricky but not impossible. Google is already able to determine reading level, presence/absence of citations and references and can distinguish content types into commercial, blog, news, forums, social networks and academic sites. Additional identification may come from use of images and media (including file naming conventions). If your content is not consistent, then you may be flagged as a fake.
And a clear flag of a content network is through a lack of any call to action (e.g. join, buy, subscribe, connect, rate…etc).
Link Signals
Google’s algorithm is based on links, which means they understand links really well. They analyse internal links, 301 redirects, look for hidden links and observe all outbound links on your site. Send the wrong link signal and the writing is on the wall for your site network.
Outbound Links
So what are some of the outbound link signals? Outbound links leave a solid footprint when attempting to manipulate rankings, starting from the anchor text you use. If you consistently use “exact match phrase” links and lack non-anchor text links such as URLs, “click here” and “read more” links, you may be flagged. The location of links on the page, and the ratio of follow and nofollow links may also generate another flag.
One of the more interesting hints comes from monitoring abnormal link placements and removals. By abnormal I mean those that do not fit in a normal pattern of link lifetime on the web. Break the pattern and you could trigger a flag.
Inbound Links
Inbound link signals are becoming more common. Google now looks at “How trustworthy are the inbound links?” and “What are the topics of pages and websites that are linking in to this site?” If your main sources of inbound links are hacked sites, forums spam and blog comment spamming, you don’t have a chance. Not in the long run anyway.
Other hints can come from the quantity and diversity of inbound links, including placement velocity and link placement removal spikes.
Related Websites
On a higher level, all available link metrics combined help Google paint a picture about your websites. They understand the genealogy of your domains via cross-site interlinking patterns, cascading Page Rank flow and Page Rank source commonalities.
Site Architecture & Technical Elements
Unless you manually create all your websites in a number of different ways and using varying technologies, you will always leave a footprint. Elements Google look at when determining if a bamboo farm exists include consistency in CMS platforms; use of consistent themes and plug-ins; URL structures including URL rewriting rules; and page extensions (e.g. php, htm, html, aspx).
And even if everything is manually set, some type of recycling is likely to occur. Common areas for recycling include the navigational level, CSS classes, and file naming conventions. Often a simple way to find if a website is part of a network is to look at its footer, as these are regularly duplicated or overlooked in the coding.
Social Media
Social media has become a major focus for Google in 2011. Google looks to social media to verify people and businesses, and to provide clues on the influence of the person or business. With the addition of Google+, the importance of social media in flagging potential spam and validating worthwhile resources will only grow.
Google’s Own Data
Google holds immense amounts of browsing and behaviour data about users and websites. High search result bounce data, combined with other flags including those generated by their link graph analysis algorithm, and it becomes relatively easy to identify fake or manipulative websites.
Competitors
People will always try to cheat the system. One of the biggest threats to bamboo farms are your competitors who are eager to push you down in the results. No matter how clever your setup, there is a chance that your scheme might be picked up and reported to Google by a competitor.
Upon quality review by Google, your website may be penalised. Google spam cops keep notes, and require you to explain in detail what you have done to fix the problem before your reconsideration request for your website will be successful.
This process also has the result of rendering your entire scheme useless, as you cannot use it again, which means you either have to find a new scheme, or do things the right way the next time.
Conclusion
Gaming Google’s algorithm is getting harder day-by-day. Google has already stated their intentions to improve the evaluation of content through authorship signals, better assessment of the social graph and improved understanding of the semantic qualities of content on the web. In order to create a bulletproof link scheme you now need to invest as much time, money and energy as if you were going the white-hat SEO way.
What to do? Continue investing in sustainable practices and expand content development capacity within the team. Speak to your link builders and ensure they can recognise fake sites. They could be building you a bamboo castle as you read this. Do SEO the right way and sleep easy at night knowing your links are safe.