Dr. Garcia posted a thread on SEW today showcasing a great collection of resources on the subject of fractals in information retrieval. This is the type of high-level discussion that usually takes a turn for the boring to most people, but the fractal nature of the web isn’t a hard concept to understand, and by taking the time to do, you may find yourself understanding how search engines work to a much greater degree.
#1 – What’s a Fractal
I started by asking Jeeves, who told me that fractals are “a geometric pattern that is repeated at every scale and so cannot be represented by classical geometry”. This wasn’t entirely helpful, so I looked at some of Jeeves’ suggested web resources and came across this from Cynthia LaniusΒ (technically it’s for kids, but that’s OK):
“Imagine that the picture at the top of this page is a picture of the coastline of Africa.You measure it with mile-long rulers and get a certain measurement. What if on the next day you measure it with foot-long rulers? Which measurement would give you a larger measurement. Since the coastline is jagged, you could get into the nooks and crannies better with the foot-long ruler, so it would yield a greater measurement. Now what if you measured it with an inch-long ruler? You could really get into the teeniest and tiniest of crannies there. So the measurement would be even bigger, that is if the coastline is jagged smaller than an inch. What if it were jagged at every point on the coastline? You could measure it with shorter and shorter rulers, and the measurement would get longer and longer. You could even measure it with infinitesimally short rulers, and the coastline would be infinitely long. That’s fractal.”
#2 Why are fractals relevant to search engines?
Dr. Garcia and other IR researchers surmise that the web itself is fractal in nature to a certain degree. Using math that we won’t get into, they’ve worked out that the link structure of the Internet is similiar to the coastline of Africa in the example above; a nearly infinite repeating structure of web pagesΒ in websites linking to oneΒ another.
Β #3 So what if the web is fractal?
If the Internet’s link structure is indeed fractal in nature and search engineers can work out the patterns of links, some very important developments can occur:
-
Un-natural link structures can be identified and investigatedΒ – watch out link spammers of all kinds!
-
Communities of topically relevant pages can be more easily identified and classified – meaning Teoma’s local link popularity model will spread.
-
Websites with particularly good content could be identified and by studying the link structures in fractals, the search engines could get a good idea of what positive linking systems look like – that technology can then be used to identify and boost sites with similiar link structures.
#4 What about text?
Text too is predicted to have a fractal nature, which would allow search engines to recognize to an even greater degree the uniqueness, nature and possibly, even quality of text content on a web page.