Seminar: Machine Learning Seminar
Generalization in Stochastic Convex Optimization
| Theoretical understanding of generalization often focuses on the tension between expressivity and generalization. Highly expressive models may fit the training data well but perform poorly on new, unseen data, while under-expressive models may have a poor fit but are less prone to overfitting. However, this principle has been challenged recently by the rise of new learning algorithms, such as deep neural networks, which are highly overparameterized and can achieve zero training error without apparent overfitting.
To better understand this conundrum, we examine the simplified setup of stochastic convex optimization (SCO). SCO is a special case of the general learning problem where a learner needs to minimize a stochastic loss function, given finite noisy instances, with the additional assumption that the instances are convex. It turns out that even in this simplified case, the behavior of generalization is far more complex than commonly believed. A classical result in SCO shows that while overfitting can occur when the number of examples is smaller than the dimension, there are algorithms that avoid overfitting, similar to deep networks. Thus, SCO serves as a case study to understand what enables certain algorithms to generalize while others fail. In this talk, we will review several recent works that demonstrate how recent approaches to explain generalization fall short in the case of SCO. For example, we will show limitations to explaining generalization through notions such as “implicit bias” and show how canonical learning algorithms cannot be described as regularized in a meaningful way. Additionally, we will discuss the interplay between memorization and learning and illustrate how attempts to provide information-theoretic generalization bounds fail to account for the success of learning algorithms in SCO. |
| Roi Livni is an assistant professor at the department of Electrical Engineering at Tel Aviv Univsersity. His research focuses on Theoretical Machine Learning. He received his Ph.D from the Center for Brain Sciences (ELSC) at The Hebrew University of Jerusalem under the supervision of Amir Globerson.
|

