What Can Search Predict?

NEWS
Oct 7, 2010

Originally published on the Yahoo Search Blog on October 1, 2010 This week research scientists at Yahoo Labs published a paper in the Proceedings of the National Academy of Sciences that examines the possibility of using web search data to predict consumer behavior. Their results have captured the public imagination and the attention of more than a few media outlets, including Technology Review, ARS Technica, Reuters and the BBC. Today, study co-author Sharad Goel shares his thoughts on the team’s conclusions. Observing what people do online at any moment can create a compelling snapshot of our collective consciousness, instantaneously reflecting the interests, concerns, and intentions of people around the world. We can even take this idea one step further: What people are searching for today can be predictive of what they will do in the near future. It’s easy to see why search data has the potential to predict: Consumers looking to buy a new camera may search the web to compare models, movie-goers may search for the opening date of a new film or to find movie theaters showing it, and travelers planning a vacation may search for places of interest, airline tickets, or hotel rooms. By aggregating the volume of search queries related to retail, movies, or travel, we might be able to predict collective behavior around economics, culture, or politics. We investigated the predictive power of search by using query volume to predict three types of cases: the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart. In all three cases we found that search counts were highly predictive of future outcomes days, and even weeks, in advance. These results show that search can predict the near future—a finding that may be useful to a wide range of consumer behaviors (such as airline travel, hotel vacancy rates, and auto sales) and economic indicators (such as real-estate prices, credit card defaults, and consumer confidence indices). How do search-based predictions perform compared to some traditional methods? We compared all search-based predictions with simple models built on publicly available information. For movies, we compared search-based predictions to a baseline that incorporates traditional predictive information like the movie’s budget, the number of screens on which it opened, and projections from the Hollywood Stock Exchange, a play-money prediction market. In many cases, we found that these traditional predictions performed on par with those generated from search. . Although search data are indeed predictive of future outcomes, alternative information sources often perform equally well. However, when we consider cases where traditional prediction data do not exist, search volume becomes a more useful predictor. For instance, take non-sequel video games, for which there is little publicly available information. In this case, search-based predictions substantially outperformed the baseline. Search appears to be most useful when key indicators (past sales performance, production budgets, etc.) do not exist or are unavailable. By adding search data to traditional models, we generally see improved performance, where the benefit ranges from modest to substantial depending on the topic. For example, search data is particularly helpful in identifying songs that will rapidly fall from the top of the charts—a difficult task for traditional methods—resulting in an augmented model that is quite a bit better than the baseline. The potential for search-based predictions seems greatest for applications like financial analysis where even a minimal performance edge can be valuable, or for situations in which it is cumbersome or expensive to collect and parse data from traditional sources. Ultimately, search can be useful in predicting real-world events, not because it is better than other traditional data, but because it is fast, convenient, and offers insight into a wide range of topics. Sharad Goel Senior Research Scientist, Yahoo Labs