A terrific paper by Christopher Yang & Fu Lee Wang of the Chinese University in Hong Kong that was mentioned by Orion in a thread at SEW but never picked up on has some worthwhile components that demand greater examination.
First off, the abstract notes that traditional methods of document (page) summarization by search engines has been shown to be inferior to the method proposed in the paper – a fractal summarization. Yang & Wang note that thematic weight of sentences has been traditionally calculated using the tf*idf (term weight) formula. They continue to use this measurement, but expand on the methodology for separating a page into smaller pieces for on-topic analysis.
“They (traditional summarization techniques) consider the document as a sequence of sentences. In the fractal summarization, the traditional salient features are adopted and the hierarchical fractal structure is also considered.”
“A term is considered more important in a range-block (piece of the document) than other range-blocks is the term appears in the range-block more frequently than other range-blocks”
“The significance of the heading is inversely proportional to its distance from the sentence. Propogation of factal value is a promising approach to calculate the heading weight for a sentence.”
“(More attention is paid) to some bonus words such as ‘conclusion'”
For SEOs, understanding this document and the papers it cites is important to gaining an understanding of how far search engines are able to take topic and relevance analysis. This ability will continue to grow over the years, making it more and more imperative that optimization specialists focus on careful site architecture, page topic architecture and link building from on-topic sites and pages. There’s nothing revolutionary here (except the notation of ‘bonus words’ – I wish I had a list), but it does re-inforce the importance of not just good writing, but very carefully structured sites to attract the most positive rankings from an algorithm.