ECE Women Community

Typical Generalization of Neural Networks with Narrow Teachers

Date: July,15,2024 Start Time: 14:30 - 15:30

Location: 1061, Meyer Building

Add to:

Lecturer: Gon Buzaglo

Research Areas:

Machine learning and intelligent systems

A main theoretical puzzle is why over-parameterized Neural Networks (NNs) generalize well when trained to zero loss (i.e., so they interpolate the data). Usually, the NN is trained with Stochastic Gradient Descent (SGD) or one of its variants. However, recent empirical work examined the generalization of a random NN that interpolates the data: the NN was sampled from a seemingly uniform prior over the parameters, conditioned on that the NN perfectly classifies the training set. Interestingly, such a NN sample typically generalized as well as SGD-trained NNs.

We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ”teacher NN” that agrees with the labels. Specifically, we show that such a ‘flat’ prior over the NN parameterization induces a rich prior over the NN functions, due to the redundancy in the NN structure. In particular, this creates a bias towards simpler functions, which require less relevant parameters to represent – enabling learning with a sample complexity approximately proportional to the complexity of the teacher (roughly, the number of non-redundant parameters), rather than the student’s.

M.Sc. student under the supervision of Prof. Daniel Soudry.

Seminar: Graduate Seminar

Seminars

Typical Generalization of Neural Networks with Narrow Teachers

Manage Consent

Seminars

Typical Generalization of Neural Networks with Narrow Teachers

Upcoming Seminars

The beauty and mystery of total variation

Arm Weak Memory Consistency on Apple Silicon: What Is It Good For?

Robust and Risk-Sensitive Reinforcement Learning: A Systematic Empirical Evaluation of the Deployment Gap

Manage Consent