Publication

A term-based inverted index organization for communication-efficient parallel query processing

Source:

IFIP International Conference on Network and Parallel Computing (2006)

Abstract:

In a typical shared-nothing parallel text retrieval system, an inverted index is distributed over the processors in the system. This distribution is usually achieved via term-based partitioning of the index. That is, the responsibility of processing each term in the vocabulary is uniquely assigned to a processor. The disadvantage of this approach is the high amount of communication overhead incurred during parallel query processing. In this work, we propose a novel inverted index partitioning model for communication-efficient query processing on parallel text retrieval systems that adopt the term-based inverted index organization. The proposed model formulates the index partitioning problem as a hypergraph partitioning problem. The model aims to balance the storage loads of processors while trying to minimize the volume of communication during parallel query processing. We report performance results over a TREC document collection, containing 210,157 documents.