Online Result Cache Invalidation for Real-time Web Search

Dec 8, 2012

Abstract: Caches of results are critical components of modern Web search engines, since they enable lower response time to fre- quent queries and reduce the load to the search engine back- end. Results in long-lived cache entries may become stale, however, as search engines continuously update their index to incorporate changes to the Web. Consequently, it is im- portant to provide mechanisms that control the degree of staleness of cached results, ideally enabling the search en- gine to always return fresh results.In this paper, we present a new mechanism that identifies and invalidates query results that have become stale in the cache online. The basic idea is to evaluate at query time and against recent changes if cache hits have had their re- sults have changed. For enhancing invalidation efficiency, the generation time of cached queries and their chronolog- ical order with respect to the latest index update are used to early prune unaffected queries. We evaluate the proposed approach using documents that change over time and query logs of the Yahoo search engine. We show that the pro- posed approach ensures good query results (50% fewer stale results) and high invalidation accuracy (90% fewer unnec- essary invalidations) compared to a baseline approach that makes invalidation decisions off-line. More importantly, the proposed approach induces less processing overhead, ensur- ing an average throughput 73% higher than that of the base- line approach.

  • Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Portland, USA