Efficient search engine measurements

Publication
Jan 1, 2007
Abstract

Abstract:

We address the problem of measuring globalquality metrics of search engines,like corpus size, index freshness, and densityof duplicates in the corpus. The recently proposed estimators for such metrics[2, 6] suffer from significant biasand/or poor performance, due to inaccurate approximationof the so called "document degrees".

We present two new estimators that are able toovercome the bias introduced by approximate degrees.Our estimators are based on a careful implementationof an approximate importance sampling procedure.Comprehensive theoretical and empirical analysis ofthe estimators demonstrates that they have essentially no biaseven in situations where document degrees are poorly approximated.

Building on an idea from [6], we discussRao Blackwellization as a generic method forreducing variance in search engine estimators. We show thatRao-Blackwellizing our estimators resultsin significant performance improvements, while not compromising accuracy.

  • WWW, Banff, Alberta, Canada

BibTeX