queryCategorizr: A Large-Scale Semi-Supervised System for Categorization of Web Search Queries

May 18, 2015

Understanding interests expressed through user's search query is a task of critical importance for many internet applications. To help identify user interests, web engines commonly utilize classification of queries into one or more pre-defined interest categories. However, majority of the queries are noisy short texts, making accurate classification a challenging task. In this demonstration, we present queryCategorizr, a novel semi-supervised learning system that embeds queries into low-dimensional vector space using a neural language model applied on search log sessions, and classifies them into general interest categories while relying on a small set of labeled queries. Empirical results on large-scale data show that queryCategorizr outperforms the current state-of-the-art approaches. In addition, we describe a Graphical User Interface (GUI) that allows users to query the system and explore classification results in an interactive manner.

  • International World Wide Web Conference (WWW) 2015
  • Demo Paper