I recently wrote a post on hacking together a linkbuilding tool where I set myself a challenge of learning a bunch of new technologies in 2 hours in order to be able to build a basic linkbuilding tool. I learnt just enough YQL, xpath, Python and Google App Engine to do the job. Since then I’ve put this to use in at least one tool that’s actually helping me and my team do our jobs better.
Inspired by this (and encouraged by Kate Morris, a recent addition to the Distilled team), I started putting together a cheatsheet of the basic YQL and xpath I had learnt. In the end, it turned into that plus inspiration of APIs and datasets that could make great starting points for tools (either for research or for creating linkworthy content):
Download it: API and data cheatsheet
Or link to it: API and datasource cheatsheet [PDF]:
API and datasource cheatsheet [PDF]
Or tweet it!
I wanted to create the kind of thing that I’d find useful to have around for inspiration and quick memory-jogs. So I focused on three areas:
Sources
APIs
I have been enjoying digging through Programmable Web to find great APIs that do cool things. The two I’m currently most excited about are:
- Face.com – just for pure awesomeness. I haven’t actually tried it yet, but a face recognition API? Are you kidding me?
- Alchemy – for the time-saving ability of extracting visible text from a page. This is the kind of thing I don’t want to have to code myself for sure.
Data sources
In addition to tools that do cool things, sometimes you need input data. Some of the APIs are designed to give you data, others manipulate data, but sometimes you just need that raw data. In addition to being one of the coolest names around (maybe I’m just a sucker for chimps), infochimps, which catalogues data sets around the web, is perhaps also one of the coolest sites on the web. With everything from the 1,000 most frequently used English words to Trst Rank for Twitter users [data] (check out their big datasets if you really want to get your hadoop on).
Magic
As I discussed in my last post, I’m not a developer. My code is testament to that. I therefore love stuff that makes my life easier. Re-using work that other smart people did was cheating at school, but is a hugely valuable life skill when you are actually trying to get real stuff done. There are a small number of bits of syntax for YQL and xpath that I keep needing to look up, so I included them in the cheatsheet.
Horsepower
You could do all this stuff yourself. Or you could get a computer to do it. The final column outlines the tools I have used to for different kinds of tasks:
- Mozenda: best for one-off site scraping and rapid proof-of concept
- 80legs: best for rapid development of well-defined tasks
- Google App Engine: best for combinations of ease-of-use and flexibility. Great for accessing APIs. Better for beginners than:
- Amazon Web Services: best for experts and production code
Sometimes things just have to be done by humans, but that doesn’t mean it necessarily has to be you doing it. I have included some links to my favourites, but Rand’s post on outsourceable SEO tasks is the place to start reading for an introduction.
Inspiration
One of the sources of inspiration for this post has been reading on DataWrangling about the work of Peter Skomoroch who is a research scientist at LinkedIn (and whose delicious links are included in the cheatsheet). I love this presentation on the creation of TrendingTopics.org:
At some point, I will loop back around and update this with more API links etc. in the meantime, another API I’ve come across is the Wordstream API which gives a load more keyword juicyness to your API fun.
If you liked this, I’d love a tweet or a link: API and datasource cheatsheet [PDF]: