Today’s SEO tools make it easier for the non-programming SEO to perform complex tasks. This article is about the newest category of such tools that simplify a complex process: extracting data from the web.
I’ve been in SEO/digital marketing for more than a decade, and in that time I’ve seen a number of different technologies pop up that have opened new doors for SEOs.
Google Analytics is a great example of software making programming tasks doable for the non-programmer because it allows anyone to become a master of metrics without touching a line of code. Dashboards of pre-processed reports remove the need for log file analysis skills and have opened the door to metrics-driven SEO for a lot of non-techies.
Open Site Explorer is another personal favorite that has helped SEOs visualize and understand the link graph, letting them take a data-driven approach to their link building strategy.
Today, the latest tool to add to your SEO arsenal is the web scraper.
What is a web scraper?
A web scraper is a tool that allows you to take data from the web and transform it into a format that you can actually use and analyze. Traditionally, these tools have required the user to know a significant amount of code, as well as have have a lot of time on their hands to figure out how to use them. However, new startups and technological advancements have made the process less complicated.
And now pretty much anyone can do at least a little web data scraping.
Why is web scraping useful?
Just as Google Analytics opened up a new world of website tracking insights to non-technical marketers, these new data scraping tools have opened up a new world of useful external data to the same audience, with almost limitless potential for
marketing awesomeness. Using web data will allow you to engage in more informed and more tactical SEO practices.
As with all new technology, however, getting your head around web data can be tough.
To that end, this guide is meant to do two things:
- Show you how to use my tool to extract data from any website as a CSV or API
- Teach you how to use the web data for SEO
How to extract data from the web
There are loads of different tools you could choose to help you with your data collection. They range quite widely in complexity and price. For most of the data you’ll likely want to get for SEO, I’d stay away from anything that says “for commercial use”. In general, you’re not going to be pulling enough data to make the cost worth it. In my opinion, you should be able to do at least 90 percent (if not more) of your web scraping without paying a thing.
This blog post by Gareth James has a pretty comprehensive list of the different web scrapers on the market. Choosing a tool is really about what you feel comfortable with. All of them require a little effort to master, so I recommend trying a few out and seeing what sticks. You’ll obviously want something that’s fast, reliable, cheap (or free), and relatively simple to use—with good support. But the most important thing to look for is a useful format you can access your data in. Because regardless of which tool you use, you’ll need to do some data cleaning and manipulating.
If you’re totally new to web scraping, and understandably a bit skeptical about spending time learning a new tool, try taking import.io’s Magic tool for a spin. (Full disclosure: I work for the company.)
But whether or not you choose to make an account with us or use our other tools, Magic is the only web-scraping tool that requires no input from you except a URL. It’s completely free, and should show you what web scraping can do without you having to spend much time learning a new system.
Here’s what happens when you extract data from Ikea.com using import.io Magic on mobile. You can see it has all the data I might need on there, including pictures, links, price, and title:
Once you have the web data, you can download it as a CSV or Excel doc, or integrate it with Google Sheets.
What can you do with web data for SEO?
The number of use cases for web data is really only limited by your creativity and time, but I’d be remiss if I didn’t give you some good ones to start with.
Make better outreach lists
If you follow Moz’s Whiteboard Friday videos and have watched this video (https://moz.com/blog/what-separates-a-good-outreach), then you know that to send a great outreach email/message, you need to do your homework and create a sense of being connected with the target. That is, you need to be able to engender a certain level of empathy and trust.
You can’t send the quality of email required without paring down your list to a small number of people so that you are almost one-on-one with each recipient.
This tactic I’m going to show you lets you curate a hyper-targeted list of authors/influencers to reach out to on a one-to-one basis according to your keywords. All you need is a web scraper and a spreadsheet.
This method can be summarized as:
- Collect/scrape a list of authors’ contact details, along with the content they wrote
- Filter the table of authors (using an Excel spreadsheet I created) by searching for keywords in your message, and compare it to what they like to write about most
- Research and contact the relevant list of authors using a personal approach
The great thing about this method is that you can easily scrape emails and contact details guilt-free.
Step 1: Get data
Find the top media publishing sites in your niche, and use a crawler/scraper/API to make a list of authors’ contact details, including all the content they’ve written.
Get the following data from each page on the site:
- Page title
- Page content text
- Author name
- Author’s contact details (e.g., Twitter handle, email address, and LinkedIn page)
- How many shares/comments each post received
You should have something like this:
Step 2: Filter data
Once you have a dataset of authors, articles and contact details, you can download it. Then you can filter it down to only target the right people.
Filter your dataset by keywords in the titles and page content that relate to the content you have created (or want to create). This will narrow down your list of potential authors to contact significantly, so that it becomes possible to message them individually via email or twitter, rather than spamming everyone on the list.
Here is a dataset template I quickly put together from a popular blog:
ExampleCSV.xlsx. Feel free to use it (unsupported).
A quick filter on the text field for “lead generation” brings up seven different articles that mentioned “lead generation” from seven different authors across these topics:
- E-commerce
- Tools
- Technology
- Growth strategies
- Marketing
- Content strategy
- Social media marketing
I would personally contact at least five of these authors. And I can see here that Bret Relander would be at the top of my list since he is has written on two of these topics.
Scale up
Here is a link to the crawler I used:
https://import.io/data/set (feel free to clone it, and add to it).
Now I have five good relevant authors/leads from one keyword, on one website, from one crawl. You can make many crawlers to many sites and very easily scale up your outreach database extremely quickly, giving you access to an amazing curated list of contacts you know are current and searchable.
Step 3: Write your outreach email
Other than to say if you are contacting an author to get your content published, you should be contacting them with well-structured content ideas, but not the finished piece, I’m not going to preach about how to write a good content piece/pitch. But this is a damn good guide I’ve found to be invaluable: http://www.slideshare.net/mikebutcher/how-to-deal-with-tech-media-by-mikebutcher.
Step 4: Contact people on the list
You can now contact your outreach prospects knowing that you have an audience that is already predisposed to be interested in your content. You will find that your lists will be much smaller, but much more effective.
A lot of authors will
only share their Twitter (or other social) handle, and not their email address. If this is the case, there are plenty of tools you can use to contact those people, too. Don’t limit yourself to email.
Do data journalism for SEO
The bread and butter of any good SEO (in my eyes) is creating original and interesting content that will attract visitors and links. Data Journalism is all the rage today and is relatively easy, so why not grab some data and make a story?
Example 1: One data set can lead to something big
It is easy for all of us to look at datasets without context, seeing it as no more than well-structured words and numbers. Following a dataset through from its creation to the point where its impact can be measured requires both the right technology and a new way of seeing the possibilities of data on the web. Getting data from a website into a form that allows you to manipulate it and draw new conclusions is what our recent work with Oxfam was all about.
It turns out that one dataset can be the beginning of something big.
We recently gave our friends at Oxfam a
dataset with all the men and women listed on Forbes’ list of The World’s Billionaires. With over 1,400 entries, we extracted data such as these billionaire’s age, sex, country of citizenship, net worth, and source of wealth. On the site, it looked like any other window into the world of the rich and rare; and in a dataset, it looked like any other spreadsheet.
But for us, and for Oxfam, it presented a golden opportunity to delve deeper and look at the disparity between our country’s rich and poor. By combining our datasat with their own internal data, Oxfam found that Britain’s five richest families are worth more than the poorest 20 percent of the people in the entire world. (Read the full story.)
What our work with Oxfam has shown is that data can be a catalyst and gives for us to ask new questions about the world in which we live. If you can do that as an SEO, your content will be a link magnet, because these are the kinds of campaigns that get mainstream media attention.
Example 2: Challenge ‘The Man’ with prostitution data and journalism
You don’t always have to work with a major organization or have your own internal data. You can draw plenty of conclusions from your own research and analysis. A friend of mine did just that and got mentioned by the BBC, The Guardian, The Mirror, and The Star.
By taking an existing news piece and applying his own data analysis to it, he was able to challenge the conclusions of the ONS report on UK GDP. This allowed him to produce two pieces of content: “How much does prostitution contribute to the UK economy?” and the follow up “Gender differences amongst sex workers online”, both of which were well-received in the mainstream media.
His analysis of the data he crawled from only one website (the largest escorting site in the UK) shows just how powerful information on the web can be when used as fuel for stories. He tweeted his findings to a BBC reporter, and within a day it was published in several mainstream media sites, generating a monstrous traffic spike.
If you’re trying to generate a really big dataset, as in the above example, you’re going to want to use a crawler, which is a bot that will travel to each page of the site and extract data from relevant pages.
I’m not going to show you how to make a crawler right here, but it’s pretty easy. (See import.io’s
crawler how-to and Scrapinghub’s Crawlera tool.)
Your turn
No matter how technical you are, or what your role is in your organization, I encourage to experiment with a few of these tools and see for yourself how awesome accessing web data can be.
What creative ways can you think of to use web data?