Vertical selection is the task of predicting relevant verticals for a Web query so as to enrich the Web search results with complementary vertical results. We investigate a novel variant of this task, where the goal is to detect queries with a question intent. Specifically, we address queries for which the user would like an answer with a human touch. We call these CQA-intent queries, since answers to them are typically found in community question answering (CQA) sites.
A typical approach in vertical selection is using a vertical's specific language model of relevant queries and computing the query-likelihood for each vertical as a selective criterion. This works quite well for many domains like Shopping, Local and Travel. Yet, we claim that queries with CQA intent are harder to distinguish by modeling content alone, since they cover many different topics. We propose to also take the structure of queries into consideration, reasoning that queries with question intent have quite a different structure than other queries.
We present a supervised classification scheme, random forest over word-clusters for variable length texts, which can model the query structure. Our experiments show that it substantially improves classification performance in the CQA-intent selection task compared to content-oriented based classification, especially as query length grows.