Introducing VisualRank
When you search for McDonalds on Google’s Image Search, Google won’t scan the pictures in its index looking for visual clues linked to McDonalds. Instead, they do what most other search engines do:
- It looks at the text surrounding the image itself (caption, paragraph text, titles, etc…)
- It reads the filename
- It reads the ALT attributes of the images
This works OK. But not great. And that is understandable. When you can’t see the actual image, you can only have an educated guess about what that image contains.
Google apparently understands this and wants to advance the technology. That is why two Googlers presented a paper entitled PageRank for Product Image Search (link to PDF). Their method improves on the classical image search by adding two crucial factors:
- Image recognition for similarity analysis
- Algorithmic factoring (based on the original PageRank algorithm) to compute actual rankings
How does it actually work
First, a set of pictures is retrieved for a particular search, and then the images are analysed using image recognition in search of visual similarities between them. Then a visual pattern is stablished (for instance, the McDonalds golden arches appear in a lot of images when someone searches for “macdonalds”).
When such similarities are found and the pattern has been established, a “link graph” is created between all images, generating a “visual link structure” based on similarities that point to the most relevant images.
For what I gathered, Google will still start with a text-based search (looking for the starting set of images) and then when these are found, they are then clustered into similar groups and the visual graph is generated. This is understandable, since doing a visual analysis for billions of images, without a pre-screening process, would use vast amounts of computing power (see section 2.2, Query Dependent Ranking).
By using this method, Google hopes to reach better relevancy for its image search. For instance, with the search “mona lisa” they hope to retrieve the original painting in a image that depicts the mona lisa in the most “original” way. They start by clustering results for mona lisa based on classical image search, then render a visual graph of visual similarities to find the “perfect mona lisa image.” That would be the original painting, and not a warhol variation, for instance.
Although no actual links are analysed, the visual link graph seemed to work very well in order to be used to weed out irrelevant results. When I read the paper, that seemed to be the principal objective if it is implemented in regular Google Image Search.
VisualRank, therefore, would not be as easily gamed as PageRank, since it depends mostly on “visual hyperlinks” that are generated by the image search engine itself. But the researchers already point to a way to produce “bias results”: by introducing many duplicate images (image spam???).
What does this mean to SEO?
From what I gathered, traditional image optimization will still be valid for some time. In order to get into that initial set of images (the pre-screening set), your images must be optimized (ALT, surrounding text, etc…).
But to get a top ranking, you probably have to bet on similarity analysis beforehand, by researching the top results on Google Image Search for your search query.
Read about this in SEL, TechCrunch, and the NYTimes
Also, please contribute with your comments. My eyes glazed over the math, so anyone who can read math could probably have a much better understanding of the actual process of Visual Linking Graphs.