Sixth Annual Machine Learning Symposium
TechTalks from event: Sixth Annual Machine Learning Symposium
Stochastic Algorithms for One-Pass LearningThe goal of the presentation is to describe practical stochastic gradient algorithms that process each training example only once, yet asymptotically match the performance of the true optimum. This statement needs, of course, to be made more precise. To achieve this, we'll review the works of Nevel'son and Has'minskij (1972), Fabian (1973, 1978), Murata & Amari (1998), Bottou & LeCun (2004), Polyak & Juditsky (1992), Wei Xu (2010), and Bach & Moulines (2011). We will then show how these ideas lead to practical algorithms that not only represent a new state of the art but are also arguably optimal.
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
Online Learning without a Learning Rate ParameterOnline learning is an approach to statistical inference based on the idea of playing a repeated game. A "master" algorithm recieves the prediction of N experts before making its own prediction. Then the outcome is revealed, and experts and master suffer a loss.
Algorithms have been developed for which the regret, the difference between the cumulative loss of the master and the cumulative loss of the best expert, is bounded uniformly over all sequences of expert predictions and outcome.
The most successful algorithms of this type are the exponential weights algorithms discovered by Littlestone and Warmuth and refined by many others. The exponential weights algorithm has a parameter, the learning rate, which has to be tuned appropriately to achieve the best bounds. This tuning typically depends on the number of experts and on the cumulative loss of the best expert. We describe a new algorithm - NormalHedge, which has no parameter and achieves comparable bounds to tuned exponential weights algorithms.
As the algorithm does not depend on the number of experts it can be used effectively when the set of experts grows as a function of time and when the set of experts is uncountably infinite.
In addition, the algorithm has a natural extension for continuous time and has a very tight analysis when the cumulative loss is described by an Ito process.
This is joint work with Kamalika Chaudhuri and Daniel Hsu.