News

Featured Researcher - Keerthi Selvaraj

Keerthi Selvaraj was a professor at the Indian Institute of Science in Bangalore when a colleague first sparked his interest in machine learning. "My focus was large scale optimization and my friend informed me that machine learning offered ample scope for optimization," Keerthi says. "He pointed out that if someone like me got involved, I could contribute a lot to the field."

Keerthi has done just that. Since joining Yahoo! Research in 2003, he’s developed a large-scale machine learning algorithm for support vector machines that is currently being used for various applications throughout Yahoo!.

For instance, the algorithm is employed for learning taxonomies that are used to automatically serve up contextual ads. The previous standard algorithm for support vector machines was having problems optimizing large data sets. As a result, it could take a day or two to train the classifiers. With Keerthi’s algorithm, however, it now takes less than an hour.

"At Yahoo! we are dealing with so much data, which means there are plenty of opportunities to solve large scale optimization problems," he says.

Yahoo! Search has also employed Keerthi’s machine learning algorithm to determine if a particular web page is spam or not. Spam is one of many indicators that the Search group uses to determine search relevance.

Human editors collect thousands of web pages and score them as belonging to one class or the other. These hand-labeled documents are then used as a training set for Keerthi's algorithm, which learns from these examples and then finds a way to automatically classify billions of other pages.

This is called supervised learning. But Keerthi admits it is a time-consuming process that still requires a lot of manual effort. That's why he has spent much of his time recently exploring so-called semi-supervised learning. "My work has concentrated on text classification problems and seeing if it's possible to take just a small number of labeled documents and improve the performance," he says.

Last year alone, Keerthi presented several influential papers on semi-supervised learning at leading conferences like the ACM SIGIR Conference, the International Conference on Machine Learning and the Neural Information Processing Systems Conference.

Keerthi is part of a team that formulates and solves large scale optimization problems to adapt the sponsored search auction marketplace to deliver maximal value under advertiser budget constraints.

Currently, Keerthi is beginning work on a new information extraction project called MeYahoo. He is part of team that is developing tools that can pull useful data from unstructured web pages.

MeYahoo, for example, could routinely monitor the pages of tens of thousands of computer scientists and pull out a range of unstructured data, such as recently published academic papers. This information could then be converted into a standard format and placed in an easily searchable database. Keerthi is applying a new model to support vector machines for accomplishing this.