ECE Women Community

Analyses of Policy Gradient for Language Model Finetuning and Optimal Control

Date: March,27,2024 Start Time: 10:30 - 11:30

Location: 1061, Meyer Building

Add to:

Lecturer: Noam Razin

Research Areas:

Machine learning and intelligent systems

Gradient-based methods are the workhorse behind modern machine learning. While they have been extensively studied in the basic framework of supervised learning, they are far less understood in the framework of optimal control, which in its broadest form is equivalent to reinforcement learning. There, algorithms that learn a policy via gradient updates are known as policy gradient methods. In this talk, I will present two recent works analyzing the optimization dynamics and implicit bias of policy gradient, in different contexts. The first work identifies a vanishing gradients problem that occurs when using policy gradient to finetune language models. I will demonstrate the detrimental effects of this phenomenon and present possible solutions. The second work characterizes how the implicit bias of policy gradient affects extrapolation to initial states unseen in training, focusing on the fundamental Linear Quadratic Regulator (LQR) control problem. Overall, our results highlight that the optimization dynamics and implicit bias of policy gradient can substantially differ from those of gradient-based methods in supervised learning, hence require dedicated study.

Noam Razin is a PhD candidate in the School of Computer Science at Tel Aviv University, where he is advised by Nadav Cohen. His research focuses on the foundations of modern machine learning. In particular, he aims to develop theories that shed light on how neural networks work, as well as bring forth principled methods for improving their reliability and performance.

Seminar: Machine Learning Seminar

Seminars

Analyses of Policy Gradient for Language Model Finetuning and Optimal Control

Seminars

Analyses of Policy Gradient for Language Model Finetuning and Optimal Control

Upcoming Seminars

Entity-Centric Reinforcement Learning for Object Manipulation from Pixels