Project

Query the Obscure


Obscure

Gaining a better understanding of queries is a top priority of the search industry. Why? Because knowing that the query “SD450” is about cameras while “NC4200” is about laptops can lead to improved search results, more focused advertisements, and a better user experience.

“If you want to put better advertising on a query, you have to first understand what that query is about,” explains Andrei Broder, Yahoo! Research Fellow and Vice President of Search Technology and Computational Advertising.

When it comes to common searches that repeat millions of times like “Britney Spears” or “Hybrid Cars,” returning the most appropriate results, or advertisements, is not difficult. But what about queries that are exceptionally rare and may never repeat more than a single time? Clearly, these queries are infinitely harder for the search engine to understand.

The query “ski wax,” for example, is fairly common and easy to grasp. But a much rarer search is “fluorocarbon wet snow.” This is actually a type of ski wax designed for warmer weather. For a die-hard ski enthusiast, it’s a very meaningful search, but for a search engine it’s quite challenging because it appears so infrequently.

With popular queries, it possible for the search engine to learn from the past. It can judge the effectiveness of search results and advertisements by counting how many people clicked on them. But for queries that seldom repeat, there is no real history from which to learn.

So how can a search engine ever understand these more obscure queries? Broder and a team of Yahoo! Researchers including Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang, set out to tackle this problem. Their work is outlined in a paper called Robust Classification of Rare Queries Using Web Knowledge, that appeared in SIGIR 2007

To address the problem, the Yahoo! team proposed a methodology for using search results, as well as information available on the Web, as a source of external knowledge. To this end, they sent rare queries to a search engine and assumed that a majority of the highest-ranking search results were relevant to the query. Categorizing these results allowed the team to classify the original query with high accuracy

The results definitively confirmed that using the Web as a repository of world knowledge contributes valuable information about the query, and aids in its correct classification. “We discovered the best source of information to understand what these rare queries are about is to look at the search results,” Broder explains. “If you look at each returned page as a vote on what the query is about, you find that the majority tends to be correct even though many individual pages are wrong.”

So, in the case of the “fluorocarbon wet snow” search, for example, some result pages may include chemistry or fishing, but the majority will be about ski wax. From there, it is a matter of properly classifying the results so that users see the most relevant, thematically correct pages and advertisements.

Ultimately, the Yahoo! team believes its methodology holds considerable promise for the handling of rare queries—both for improving search results and yielding better matched advertisements.