Rank by Time or by Relevance?: Revisiting Email Search

Oct 19, 2015

With Web mail services offering larger and larger storage capacity, most users do not feel the need to systematically delete messages anymore and inboxes keep growing. It is quite surprising that in spite of the huge progress of relevance ranking in Web Search, mail search results are still typically ranked by date. This can probably be explained by the fact that users demand perfect recall in order to "re-find" a previously seen message, and would not trust relevance ranking. Yet mail search is still considered a difficult and frustrating task, especially when trying to locate older messages. In this paper, we study the current search traffic of Yahoo mail, a major Web commercial mail service, and discuss the limitations of ranking search results by date. We argue that this sort-by-date paradigm needs to be revisited in order to account for the specific structure and nature of mail messages, as well as the high-recall needs of users. We describe a two-phase ranking approach, in which the first phase is geared towards maximizing recall and the second phase follows a learning-to-rank approach that considers a rich set of mail-specific features to maintain precision. We present our results obtained on real mail search query traffic, for three different datasets, via manual as well as automatic evaluation. We demonstrate that the default time-driven ranking can be significantly improved in terms of both recall and precision, by taking into consideration time recency and textual similarity to the query, as well as mail-specific signals such as users' actions.

  • Conference/Workshop Paper