Mapping Search Query Language to Advertiser Bidded Terms

Jun 17, 2009

Advertisers want to advertise alongside search results because it is a very effective channel to find someone searching for an item in their sales inventory. One problem they face is that search users speak a very broad language when it comes to search queries. Some users may use a basic term to describe a product, and another may use a very descriptive approach to the same or similar product. To effectively advertise a product, the advertiser would like to target all similar queries but it is very difficult to predict how search users will refer to any particular product. Yahoo Labs Sponsored Search Sciences is therefore working on techniques which can learn mappings from the user-query space, to the advertiser bidded-term space.

There are two components to the solution. First, candidate mappings are generated using a variety of techniques, and second, a relevance score is computed for each of the candidates to ensure that only the most appropriate mappings are used. In addressing both problems, Yahoo Labs is able to leverage the large numbers of search queries Yahoo receives each day, along with deep access to the Yahoo search engine.

For those queries we see frequently in the logs, the candidate mappings can be pre-computed. There are several approaches being investigated, including collaborative filtering techniques and query traversal graphs to derive relationships between queries and the ads search users have clicked on. For queries that haven't been pre-computed, online processing utilizes the pre-computed mappings to try to predict good mappings for the unseen queries. Continuing research in these areas aims to pull in more knowledge sources, and apply more semantic reasoning to the candidate generation.

In order to score the mappings, we try to gather knowledge about the queries, and the phrases in the queries. Knowledge sources include the search engine results page, as well as internal data from the search engine. Query processing for identifying segments and semantics with the queries is also important. In addition we utilize the advertiser database to maintain coherent transformations in order to preserve the product intent of the original query. All these knowledge sources are used to generate features and to train machine learned models to predict the relevance of the mappings.