I stumbled across Professor Rich Ackerman’s Theory of Information Retrieval Site today and found this excellent summary on the theory of normalized term weight. There’s some nasty math and equations but the explanations themselves are very clear. Ackerman even has a sample term vector calculation tool that searches a small set of 10 documents and ranks them based on term weight.
SEOmoz is once again working on building an accurate term weight measurement tool so we can do away with keyword density once and for all. We’re also deep into a tool to calculate the on-topic relevance of a page and are planning to train a bayesian algorithm on the DMOZ data dump and possibly a Wikipedia dump as well (depending on our bandwidth capabilities). It should be an exciting project when it’s finished and will, as always, be available for free to the community.