Syntactic Parsing of Web Queries with Question Intent

Publication
Jun 12, 2016
Abstract

Accurate automatic processing of Web queries is important for high-quality information retrieval from the Web. While the syntactic structure of a large portion of these queries is trivial, the structure of queries with question intent is much richer. In this paper we therefore address the task of statistical syntactic parsing of such queries. We first show that the standard dependency grammar does not ac- count for the full range of syntactic structures manifested by queries with question intent. To alleviate this issue we extend the dependency grammar to account for segments – independent syntactic units within a potentially larger syntactic structure. We then propose two distant supervision approaches for the task. Both algorithms do not require manually parsed queries for training. Instead, they are trained on millions of (query, page title) pairs from the Community Question Answering (CQA) domain, where the CQA page was clicked by the user who initiated the query in a search engine. Experiments on a new treebank consisting of 5,000 Web queries from the CQA domain, manually parsed using the proposed grammar, show that our algorithms outperform alternative approaches trained on various sources: tens of thousands of manually parsed OntoNotes sentences, millions of unlabeled CQA queries and thousands of manually segmented CQA queries.

  • The North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2016)
  • Conference/Workshop Paper

BibTeX