Massive, Sparse, Efficient Multilabel Learning

Invited talk by Charles Elkan

Abstract: Many real-world applications of machine learning have multilabel classification at their core. This talk will present progress towards a multilabel learning method that can handle 10^7 training examples, 10^6 features, and 10^5 labels on a single workstation. A sparse linear model is learned for each label simultaneously by stochastic gradient descent with L2 and L1 regularization. Tractability is achieved through careful use of sparse data structures, and speed is achieved by using the latest stochastic gradient methods that do variance reduction. Both theoretically and practically, these methods achieve order-of-magnitude faster convergence than Adagrad. We have extended them to handle non-differentiable L1 regularization. We show experimental results on classifying biomedical articles into 26,853 scientific categories. [Joint work with Galen Andrew, ML intern at Amazon.]

Charles Elkan is the first Amazon Fellow, on leave from being a professor of computer science at the University of California, San Diego. In the past, he has been a visiting associate professor at Harvard and a researcher at MIT. His published research has been mainly in machine learning, data science, and computational biology. The MEME algorithm that he developed with Ph.D. students has been used in over 3000 published research projects in biology and computer science. He is fortunate to have had inspiring undergraduate and graduate students who are in leadership positions now, such as vice president at Google and professor at the University of Washington.