Improved Caching Techniques for Large-scale Image Hosting Services

Jul 18, 2016

Commercial image serving systems, such as Flickr and Face- book, rely on large image caches to avoid the retrieval of requested images from the costly backend image store, as much as possible. Such systems serve the same image in different resolutions and, thus, in different sizes to different clients, depending on the properties of the clients’ devices. The requested resolutions of images can be cached individually, as in the traditional caches, reducing the backend workload. However, a potentially better approach is to store relatively high-resolution images in the cache and resize them during the retrieval to obtain lower-resolution images. Having this kind of on-the-fly image resizing capability enables image serving systems to deploy more sophisticated caching poli- cies and improve their serving performance further.

In this paper, we formalize the static caching problem in image serving systems which provide on-the-fly image resizing functionality in their edge caches or regional caches. We propose two gain-based caching policies that construct a static, fixed-capacity cache to reduce the average serving time of images. The basic idea in the proposed policies is to identify the best resolution(s) of images to be cached so that the average serving time for future image retrieval requests is reduced. We conduct extensive experiments using real-life data access logs obtained from Flickr. We show that one of the proposed caching policies reduces the average response time of the service by up to 4.2% with respect to the best-performing baseline that mainly relies on the access frequency information to make the caching decisions. This improvement implies about 25% reduction in cache size under similar serving time constraints. 

  • ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016)
  • Conference/Workshop Paper