Seminar: Graduate Seminar

Investigating the Implicit Bias in Over-parametrized Models

Date: April,11,2024 Start Time: 10:30 - 11:30
Location: 1061, Meyer Building
Add to:
Lecturer: Mor Shpigel Nacson

Despite the remarkable performance of neural networks in recent years, our theoretical understanding of their success remains incomplete. In particular, neural networks used in practice are typically overparameterized, meaning they have more parameters than training samples. Thus, multiple solutions of the training loss minimization can fit the training data, each with distinct generalization properties. Despite this, empirically neural networks typically converge to “good solutions,” i.e., minima that generalize well. In recent years, it has been established that implicit biases introduced by the training algorithm play a pivotal role in this phenomenon by guiding neural network optimization procedures to favor certain solutions over others.
In this study, we address the implicit bias research question, which seeks to uncover objectives that are implicitly minimized by optimization algorithms, beyond the training loss. First, we examine how the characteristics of the loss function affect the implicit bias for linear models trained with the gradient descent (GD) optimization algorithm. Then, we investigate how the step size affects the implicit bias for linear diagonal neural network models. Specifically, using dynamical stability, we demonstrate the significant role that large step sizes can play in inducing sparsity. Finally, we explore the behavior of GD during the Edge of Stability (EoS) phase, characterized by chaotic and oscillatory behavior in training loss and sharpness. By identifying a quantity that consistently decreases during GD training, we gain a better theoretical understanding of the optimization dynamics during the EoS phase and how they can lead to wider solutions.

Ph.D. Under the supervision of Prof. Daniel Soudry.

 

All Seminars
Skip to content