Xan Porter has an excellent blog entry on the subject of stemming in search engine indices. The article points to a terrific trove of resources on stemming, how it is performed and what it’s used for.
Basically, stemming is the process of reducing words to their most basic form in order to run more mathematical analysis on the relationships between the words, the subject of the document, etc. For example, here are some words that have been reduced to their stem by the Porter Algorithm:
surprising surpris definitely definit fishing fish
Using these stemmed versions of the words, a search engine can more easily run calculations based on term vectors, term weight and other technical measurements of a document’s content.
In other news, I’m now back from my long East Coast trip to Toronto, Boston, New York, New Jersey and even Philadelphia. Thanks to everyone who has e-mailed me for your patience in awaiting a reply.