WordNet is a lexical database (a collection of words) that has been used by major search engines and IR research projects for many years. The database can be accessed through Princeton University’s website and even dowloaded for non-commercial use for use on Linux/Unix/Mac systems.
WordNet can be used to get information about the following for a given word or phrase:
- Synonyms – Words that have the same meaning (soil = dirt)
- Hypernyms – The generic term used to designate a class of specifics (i.e. soil is a kind of land)
- Hyponyms – A member of a class of terms (i.e. clay is a kind of soil)
- Holonyms – Name of a whole of which other words are a part (i.e. nutrients are a part of the soil)
- Meronyms – Parts of the holonym (i.e. soil is part of the ground)
WordNet also provides information on co-ordinate terms, derivates, senses and more. For SEOs, understanding what the search engines “know” about your target terms/phrases is a step towards understanding how to construct excellent site architectures. The extent to which search engines may classify data is unknown, but one could certainly presume that any data available to WordNet is also able to be utilized by the commercial search engines. For more information, see my thread at SEOChat on the subject.