Building Enriched Document Representations using Aggregated Anchor Text
Source:
SIGIR (2009)
Abstract:
It is well known that anchor text plays a critical role in a variety of search tasks performed over hypertextual domains,
including enterprise search, wiki search, and web search. It
is common practice to enrich a document's standard textual
representation with all of the anchor text associated with
its incoming hyperlinks. However, this approach does not
help match relevant pages with very few inlinks. In this pa-
per, we propose a method for overcoming anchor text sparsity by enriching document representations with anchor text
that has been aggregated across the hyperlink graph. This
aggregation mechanism acts to smooth, or diffuse, anchor
text within a domain. We rigorously evaluate our proposed
approach on a large web search test collection. Our results
show the approach significantly improves retrieval eeffectiveness, especially for longer, more difficult queries.
Download: