Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search

Aug 10, 2015

Search engines represent one of the most popular web services, visited by more than 85% of internet users on a daily basis. Advertisers are interested in making use of this vast business potential, as very clear intent signal communicated through the issued query allows effective targeting of users. This idea is embodied in a sponsored search model, where each advertiser maintains a list of keywords they deem indicative of increased user response rate with regards to their business. According to this targeting model, when a query is issued all advertisers with a matching keyword are entered into an auction according to the amount they bid for the query, and the winner gets to show their ad. One of the main challenges is the fact that a query may not match many keywords, resulting in lower auction value, lower ad quality, and lost revenue for advertisers and publishers. Possible solution is to expand a query into a set of related queries and use them to increase the number of matched ads, called query rewriting. To this end, we propose rewriting method based on a novel query embedding algorithm, which jointly models query content as well as its context within an online session. As a result, queries with similar content and context are mapped into vectors close in the embedding space, which allows expansion of a query via simple K-nearest neighbor search in the projected space. The method was trained on more than 12 billion sessions, one of the largest corpuses reported thus far, and evaluated on both public TREC data set and in-house sponsored search data set. The results show the proposed approach significantly outperformed existing state-of-the-art, strongly indicating its benefits and the monetization potential.

  • 38th Annual ACM Special Interest Group on Information Retrieval (SIGIR) Conference (SIGIR 2015)
  • Conference/Workshop Paper