Paradigm changes in machine learning are rare and often driven by changes in technology. The availability of computers led to a bloom in applied statistical methods in the 60s. Workstations with sufficient amounts of memory led to the use of nonparametric tools and matrix computation such as kernel methods in the 80s and 90s. Clusters of workstations made large-scale data parallel computation possible in this decade. At present we are at the cusp of a new paradigm change: multi-core processing, which comes about when processor speeds are no longer increasing. Instead, the number of cores per computer is growing steadily every year.
The NIPS workshop on large-scale machine learning, held in Whistler on December 11, gathered a rather diverse community of researchers working on this problem. What makes large-scale machine learning exciting is the fact that it requires a combination of technologies ranging from microprocessor architecture and distributed systems to convergence properties of statistical estimators and approximate samplers. Highlights included Viktor Prasanna’s talk on parallel inference for graphical models which addressed issues such as cache coherency and non-uniform architectures for probabilistic inference, John Langford
’s talk (and a special demo) on Vowpal Wabbit
, a blazingly fast online learning code, and Joey Gonzalez
’s talk on convergence strategies for message passing on large factor graphs.
As is common with many new and exciting fields of research, there was a bloom of a wide-range of diverse ideas. This included implementations of algorithms on graphics cards and even FPGAs (Field-Programmable Gate Arrays) on one hand, and the adaptation of parallel estimation algorithms to existing computational strategies such as MapReduce (parallel logistic models) on the other hand. At present only a small number of computational paradigms map efficiently into existing parallelization structures (Google's MapReduce or Yahoo's Hadoop
, Microsoft's Dryad, and streaming architectures such as Yahoo's S4), and we will likely see a convergence of systems technologies and suitable algorithms over the next decade.
It is highly likely that within a decade all widely used machine learning algorithms will be parallel. Given the increasing need for intelligent data analysis this trend is hard to underestimate.
For more information on the workshop, visit http://www.select.cs.cmu.edu/meetings/biglearn09/