Learning from the past: answering new questions with past answers

Feb 1, 2012

Abstract: Community-based Question Answering sites, such as YahooAnswers or Baidu Zhidao, allow users to get answers to complex,detailed and personal questions from other users. However,since answering a question depends on the ability andwillingness of users to address the asker’s needs, a significantfraction of the questions remain unanswered. We measuredthat in Yahoo Answers, this fraction represents 15% of allincoming English questions. At the same time, we discoveredthat around 25% of questions in certain categories arerecurrent, at least at the question-title level, over a periodof one year.We attempt to reduce the rate of unanswered questions inYahoo Answers by reusing the large repository of past resolvedquestions, openly available on the site. More specifically,we estimate the probability whether certain new questionscan be satisfactorily answered by a best answer fromthe past, using a statistical model specifically trained forthis task. We leverage concepts and methods from queryperformanceprediction and natural language processing inorder to extract a wide range of features for our model. Thekey challenge here is to achieve a level of quality similar tothe one provided by the best human answerers.We evaluated our algorithm on offline data extracted fromYahoo Answers, but more interestingly, also on online databy using three “live” answering robots that automaticallyprovide past answers to new questions when a certain degreeof confidence is reached. We report the success rate of theserobots in three active Yahoo Answers categories in terms ofboth accuracy, coverage and askers’ satisfaction. This workpresents a first attempt, to the best of our knowledge, ofautomatic question answering to questions of social nature,by reusing past answers of high quality.

  • WWW
  • Conference/Workshop Paper