Key Scientific Challenges, Entry #6: Machine Learning
On January 27 we announced the kick-off of our 2010 Key Scientific Challenges Program. To highlight the scientific challenge areas included in the program, we launched a series of guest blog posts earlier this month on Yodel Anecdotal. Read our previous post on Web Information Management.
Another big challenge our Yahoo research scientists are continually examining is machine learning. In this entry, John Langford from Yahoo Labs shares some thoughts on how Yahoo is driving research into machine learning and why it’s a fascinating field.
When I wake up in the morning, I can't resist checking my email and browsing the Internet to see if anything has come up. Then I get to work thinking, writing, searching, finding, and learning various things, all using an Internet that’s powered by machine learning in dozens of ways. When I go to sleep at night, I smile because I know that in addition to using machine learning throughout my day, I’ve also done my part to advance machine learning technology, many others have done likewise, and that by doing so better we’re making a major impact on people’s lives.
Even though machine learning has such a broad influence on the Internet, it can be quite difficult to recognize. This is primarily because machine learning’s benefits are often hidden -- they are the spam emails you don't see, the uninteresting news articles you don't see, and the irrelevant search results you don't see, just to name a few. In this sense, machine learning is like an invisible hand. It’s also sometimes easier to recognize the flaws in a machine learning system – like "Why did my email end up in my friend’s spam folder?" – than it is to notice its benefits. But despite these quirks, machine learning is one of the best technologies we have for solving some of the biggest problems on the Web.
The problem of spam is representative of why machine learning is so effective. Spammers are constantly changing and adapting their strategies and technology to evade even the most capable filters. Machine learning attacks this problem by aiming to build an automatic system capable of staying ahead of the game and continually refining itself in response to its environment. We haven’t completely achieved that goal yet, but progress is steady. Machine learning systems can always get better, learn more, work faster and in ever more ways
, because people will always want less spam
and more interesting and relevant news articles
Naturally, this reality means we’re constantly running into both the empirical and theoretical boundaries of machine learning and statistics. How do I learn from so much data that we can’t fit it on a machine? How do we extract evidence of what the best decision was? What if the best decision changes? How do I minimize the need to know the best decision? How do I effectively use the incredibly large quantities of information available on the Internet? And how do I fit it all together in an automatic way that is useful to someone? And how do you know it's useful?
Good answers to these questions can improve the life of just about everyone, which is the core reason why Yahoo is sponsoring the Key Scientific Challenges Program
. If you are a graduate student working on these questions, you understand how exciting and challenging this field is. And if you aren't, consider the satisfaction associated with changing fields :-)