Top-k Aggregation Using Intersections of Ranked Inputs
Source:
Second ACM International Conference on Web Search and Data Mining, Barcelona, Spain (2009)
URL:
http://www.ideal.ece.utexas.edu/~kunal/papers/wsdm2009-topkintersections.pdf
Abstract:
There has been considerable past work on efficiently computing
top k objects by aggregating information from multiple ranked lists
of these objects. An important instance of this problem is query
processing in search engines: One has to combine information from
several different posting lists (rankings) of web pages (objects) to
obtain the top k web pages to answer user queries. Two particularly
well-studied approaches to achieve efficiency in top-k aggregation
include early-termination algorithms (e.g., TA and NRA) and preaggregation
of some of the input lists. However, there has been
little work on a rigorous treatment of combining these approaches.
We generalize the TA and NRA algorithms to the case when preaggregated
intersection lists are available in addition to the original
lists. We show that our versions of TA and NRA continue to remain
“instance optimal,” a very strong optimality notion that is a
highlight of the original TA and NRA algorithms. Using an index
of millions of web pages and real-world search engine queries, we
empirically characterize the performance gains offered by our new
algorithms. We show that the practical benefits of intersection lists
can be fully realized only with an early-termination algorithm.