Predicting Query to Ad Relevance

Jun 17, 2009

One of the differences between web search results and sponsored search results is the commercial intent. It is important for user satisfaction to find the ads displayed on the search page both relevant and useful. Similarly advertisers are paying for click-through traffic and are interested in ROI, so it is important to direct to the advertiser only traffic which is likely to use the advertiser's services. Although ad-retrieval mechanisms are designed to take a user query and pull relevant ads from a database, a second round of relevance scoring (relevance of ad given the query) can take into account more information, and produce a consistent relevance score across all ads for the query.

Relevance scores are useful in many aspects of sponsored search. They can help in filtering low relevance ads, and in making decisions about where to show the ads (more prominent vs. less prominent page locations). Relevance scores also help in comparing the quality of the candidate ads, with the quality of the web results helping to inform the decision of whether to show any ads at all. Yahoo Labs Sponsored Search Sciences is working on both query processing and advertiser-side processing to generate features for use in relevance prediction models.

On the query side it is important to work out the intent of the user query using natural language processing (NLP) techniques, segmentation and entity detection. Since the typical search query is short (2-3 words) it is also useful to expand the query context by using information available from the search engine. One example of ongoing research in this area is detecting the important query terms and working out how they combine into a semantic entity.

The advertiser text also needs processing in a similar manner to the query. Although the processing is nominally the same, the problem is different because of the difference in the underlying generation models. Queries are typically keyword-based, whereas the advertiser text is marketing-anguage-based, emphasizing features or price. Thus the advertiser side involves more text analysis and parsing than the query side.

Key research areas in building machine learned models to predict relevance include feature generation, classification of queries and ads, and language processing. Related research is aimed at other aspects of advertiser behavior which can affect user experience, such as modeling whether the advertisers accurately portray themselves in the advertiser text.