Mail

Learning through Exploration

Publication
Dec 31, 1969
Abstract

I will talk about interactive learning applied to several core problems at Yahoo. Solving these problems well requires learning from user feedback. The difficulty is that only the feedback for what is actually shown to the user is observed. The need for exploration makes these problems fundamentally different from standard supervised learning problems—if a choice is not explored, we can’t optimize for it. Through examples, I will discuss the importance of gathering the right data. I will then discuss how to reuse data collected by production systems for offline evaluation and direct optimization. Being able to reliably measure performance offline allows for much faster experimentation, shifting from guess-and-check with A/B testing to direct optimization.

BibTeX