Invited talk by Charles Elkan
Abstract: Many real-world applications of machine learning have
multilabel classification at their core. This talk will present
progress towards a multilabel learning method that can handle 10^7
training examples, 10^6 features, and 10^5 labels on a single
workstation. A sparse linear model is learned for each label
simultaneously by stochastic gradient descent with L2 and L1
regularization. Tractability is achieved through careful use of
sparse data structures, and speed is achieved by using the latest
stochastic gradient methods that do variance reduction. Both
theoretically and practically, these methods achieve
order-of-magnitude faster convergence than Adagrad. We have extended
them to handle non-differentiable L1 regularization. We show
experimental results on classifying biomedical articles into 26,853
scientific categories. [Joint work with Galen Andrew, ML intern at
Amazon.]
Charles Elkan is the first Amazon Fellow, on leave from being a professor of computer science at the University of California, San Diego. In the past, he has been a visiting associate professor at Harvard and a researcher at MIT. His published research has been mainly in machine learning, data science, and computational biology. The MEME algorithm that he developed with Ph.D. students has been used in over 3000 published research projects in biology and computer science. He is fortunate to have had inspiring undergraduate and graduate students who are in leadership positions now, such as vice president at Google and professor at the University of Washington.