Let’s try a little exercise…
Common features of spam domains include:
- Long domain names
- .info, .cc, .us and other cheap, easy to grab TLDs
- Short registration period (1 year, maybe 2)
- High ratio of ad blocks to content
- Javascript redirects from initial landing pages
- Use of common, high-commercial value spam keywords like “mortgage,” “poker,” “texas hold ’em,” “porn,” “student credit cards,” and related terms
- Many links to other low quality, spam sites
- Few links to high quality, trusted sites
- High keyword frequencies and keyword densities
- Small amounts of unique content
- Very few direct visits
- Very few links sent out in (non-spam) email to the site
- Registered to people/entities not associated with trusted sites
- Not frequently registered with services like Yahoo! Site Explorer, Google Webmaster Central or Live Webmaster Tools
- Rarely have short, high value domain names
- Often contain many keyword-stuffed subdomains
- More likely to have longer domain names
- More likely to contain multiple hyphens in the domain name
- Less likely to have links from trusted sources
- Less likely to have SSL Security certificates
- Less likely to be in directories like DMOZ, Yahoo!, Librarian’s Internet Index, etc.
- Unlikely to have any significant quantity of branded searches
- Unlikely to be bookmarked in services like My Yahoo!, Del.icio.us, Faves.com, etc.
- Unlikely to get featured in social voting sites like Digg, Reddit, Yahoo! Buzz, StumbleUpon, etc.
- Unlikely to have channels on YouTube, communities on Facebook or links from Wikipedia
- Unlikely to be mentioned on major news sites (either with or without link attribution)
- Unlikely to register with Google/Yahoo!/MSN Local Services
- Unlikely to have a legitimate physical address/phone number on the website
- Likely to have the domain associated with emails on blacklists
- Often contain a large number of snippets of “duplicate” content found elsewhere on the web
- Unlikely to contain unique content in the form of PDFs, PPTs, XLSs, DOCs, etc.
- Frequently feature commercially focused content
- Many levels of links away from highly trusted websites
- Rarely contain privacy policy and copyright notice pages
- Rarely listed in Better Business Bureau’s Online Directory
- Rarely contains high grade level text content (as measured by metrics like Fleisch-Kincaid Reading Level)
- Rarely have small snippets of text quoted on other websites and pages
- Cloaking based on user-agent or IP address is common
- Rarely contain paid analytics tracking software
- Rarely have online or offline marketing campaigns
- Rarely have affiliate link programs pointing to them
- Less likely to have .com or .org extensions
- Almost never have .mil, .edu or .gov extensions
- Rarely have links from domains with .edu or .gov extensions
- Almost never have links from domains with .mil extensions
- Rarely receive high quantities of monthly visits
- Rarely have visits lasting longer than 30 seconds
- Rarely have visitors bookmarking their domains in the browser
- Unlikely to buy significant quantities of PPC ad traffic
- Rarely have banner ad media buys
- Likely to have links to a significant portion of the sites and pages that link to them
- Extremely unlikely to be mentioned or linked-to in scientific research papers
- Unlikely to use expensive web technologies (Microsoft Server & Coding Products that Require a Licensing Fee)
- Likely to be registered by parties who own a very large number of domains
- Unlikely to attract significant return traffic
- More likely to contain malware, viruses or spyware (or any automated downloads)
For high quality content domains, the opposite is true (at least, for a good percentage of these). Now think about the sites you’re building – which features apply to them? What could you do differently to be more like the “high quality” category and less like the “spam”?
BTW – Love to hear your take on features you think are common to spam, or to high quality sites.