seo

Separating Web Spam from Quality Content – What are the Metrics?

Let’s try a little exercise…

Common features of spam domains include:

  • Long domain names
  • .info, .cc, .us and other cheap, easy to grab TLDs
  • Short registration period (1 year, maybe 2)
  • High ratio of ad blocks to content
  • Javascript redirects from initial landing pages
  • Use of common, high-commercial value spam keywords like “mortgage,” “poker,” “texas hold ’em,” “porn,” “student credit cards,” and related terms
  • Many links to other low quality, spam sites
  • Few links to high quality, trusted sites
  • High keyword frequencies and keyword densities
  • Small amounts of unique content
  • Very few direct visits
  • Very few links sent out in (non-spam) email to the site
  • Registered to people/entities not associated with trusted sites
  • Not frequently registered with services like Yahoo! Site Explorer, Google Webmaster Central or Live Webmaster Tools
  • Rarely have short, high value domain names
  • Often contain many keyword-stuffed subdomains
  • More likely to have longer domain names
  • More likely to contain multiple hyphens in the domain name
  • Less likely to have links from trusted sources
  • Less likely to have SSL Security certificates
  • Less likely to be in directories like DMOZ, Yahoo!, Librarian’s Internet Index, etc.
  • Unlikely to have any significant quantity of branded searches
  • Unlikely to be bookmarked in services like My Yahoo!, Del.icio.us, Faves.com, etc.
  • Unlikely to get featured in social voting sites like Digg, Reddit, Yahoo! Buzz, StumbleUpon,  etc.
  • Unlikely to have channels on YouTube, communities on Facebook or links from Wikipedia
  • Unlikely to be mentioned on major news sites (either with or without link attribution)
  • Unlikely to register with Google/Yahoo!/MSN Local Services
  • Unlikely to have a legitimate physical address/phone number on the website 
  • Likely to have the domain associated with emails on blacklists
  • Often contain a large number of snippets of “duplicate” content found elsewhere on the web
  • Unlikely to contain unique content in the form of PDFs, PPTs, XLSs, DOCs, etc.
  • Frequently feature commercially focused content
  • Many levels of links away from highly trusted websites
  • Rarely contain privacy policy and copyright notice pages
  • Rarely listed in Better Business Bureau’s Online Directory
  • Rarely contains high grade level text content (as measured by metrics like Fleisch-Kincaid Reading Level)
  • Rarely have small snippets of text quoted on other websites and pages
  • Cloaking based on user-agent or IP address is common
  • Rarely contain paid analytics tracking software
  • Rarely have online or offline marketing campaigns
  • Rarely have affiliate link programs pointing to them
  • Less likely to have .com or .org extensions
  • Almost never have .mil, .edu or .gov extensions
  • Rarely have links from domains with .edu or .gov extensions
  • Almost never have links from domains with .mil extensions
  • Rarely receive high quantities of monthly visits
  • Rarely have visits lasting longer than 30 seconds
  • Rarely have visitors bookmarking their domains in the browser
  • Unlikely to buy significant quantities of PPC ad traffic
  • Rarely have banner ad media buys
  • Likely to have links to a significant portion of the sites and pages that link to them
  • Extremely unlikely to be mentioned or linked-to in scientific research papers
  • Unlikely to use expensive web technologies (Microsoft Server & Coding Products that Require a Licensing Fee)
  • Likely to be registered by parties who own a very large number of domains
  • Unlikely to attract significant return traffic
  • More likely to contain malware, viruses or spyware (or any automated downloads)

For high quality content domains, the opposite is true (at least, for a good percentage of these). Now think about the sites you’re building – which features apply to them? What could you do differently to be more like the “high quality” category and less like the “spam”?

BTW – Love to hear your take on features you think are common to spam, or to high quality sites.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button