Project

Similarity Caching



Yahoo! Labs is diligently working on enhancing the conventional notion of caching to make it more effective for contextual advertising systems. The team consists of researchers Andrei Broder, Vanja Josifovski, Ravi Kumar, Sandeep Pandey, Sergei Vassilvitskii and summer intern Flavio Chierichetti.

In the classical setting, a cache (which refers to a fast but limited amount of memory) is used to store and serve items that are repeatedly requested. A cache hit occurs when a requested item happens to be in the cache: in this case the cost of serving the item is minimal. Otherwise, if the cache does not contain the requested item, the item is fetched from disk (a form of slower but unlimited memory) at a substantially higher cost. At this point the new item can be stored in the cache to serve future requests, but since the cache size is limited, another item must be evicted. A caching policy is a decision procedure that states which items will be stored and which items will be evicted with the goal of serving as many requests as possible. Bear in mind that the request sequence is not known in advance. This is the setting of the classical caching problem, which has been studied for nearly half a century and has a rich research history.

The Yahoo! Labs team recognized that classical caching may not be effective in applications such as contextual advertising where requests are unlikely to be repeated – in that case almost none of the requests would hit an item already in a cache. The dearth of repetition happens because a request in a contextual advertising is a query that captures both the content of a particular web page and the demographics of a user viewing that page. The result of the query is a set of ads to be displayed to the user. Since both page contents and user demographics have myriad of possible values, it is quite unlikely that exactly the same query is seen often enough to make conventional caching effective.

On the other hand, serving contextual ads offers the following flexibility: the same ads can be shown to similar users visiting similar pages. Many other applications behave in the same manner, allowing the serving of an item from the cache that is similar but not necessarily identical to the requested item, thus leading to the notion of similarity caching.

“What we are doing is essentially a new take on a classical problem motivated by modern applications,” says researcher Sandeep Pandey.

The objective of similarity caching is to serve as many requests from the cache as possible, by implementing a policy that takes advantage of the freedom to provide a similar item. Imagine each item as a point in space and consider a ball around each point in the cache –a request that falls into one of the cached balls results in a hit. The similarity caching policy aims to cache the balls that are deemed most likely to be hit. Furthermore, the choice of balls should be judicious: they should not overlap each other much.

Similarity caching can improve the performance of contextual advertising systems by helping them handle big spikes in traffic. For example, if the system can handle only one million requests per hour and the traffic goes up drastically during certain hours of the day, similarity caching will kick in to help serve ads at peak times at the cost of a slight loss of precision. The advantage is that the infrastructure cost is reduced and user experience improved, since we can deal with isolated load peaks by reducing precision and not by increasing response time.