ECE Women Community

LLM Post-Training and Reasoning via Efficient Value-Based RL.

Date: December,17,2025 Start Time: 12:00 - 13:00

Location: 506, Zisapel Building

Zoom: Zoom link

Add to:

Lecturer: Associate prof. Nathan Kallus

Research Areas:

Machine learning and intelligent systems

Reinforcement learning (RL) has a newfound killer application in post-training LLMs pre-trained to predict next token to adapt to tasks like instruction following, math-problem solving, and generating content or recommendations that maximize user outcomes. But are the same RL algorithms that animated robots and conquered Atari the right ones to post-train LLMs? In this talk I will present new value-based algorithms for post-training and for scaling test-time compute that leverage both the unique structure of autoregressive LLMs and recent advances on increasing efficiency by changing the Q-learning loss function. I will show how (and argue why) these new algorithms achieve state-of-the-art performance on frontier math reasoning tasks with smaller models and at a fraction of test-time FLOPs.

The talk covers these papers:

https://arxiv.org/abs/2505.17373 Value-Guided Search for Efficient Chain-of-Thought Reasoning (NeurIPS ’25)
https://arxiv.org/abs/2502.20548 Q#: Provably Optimal Distributional RL for LLM Post-Training (NeurIPS ’25)

https://arxiv.org/abs/2409.12799 The Central Role of the Loss Function in Reinforcement Learning (Statistical Science)

Bio:

Nathan is an associate professor of operations research and information engineering at Cornell Tech and Cornell Engineering. He also serves as Research Director for Machine Learning and Inference at Netflix.

Kallus’s research interests include causal inference, especially when combined with machine learning; the statistics of optimization under uncertainty; sequential and dynamic decision making; and algorithmic fairness. He is the author of the book “Applied Causal Inference Powered by ML and AI.”

Seminar: Guest Lecture

Seminars

LLM Post-Training and Reasoning via Efficient Value-Based RL.

Seminars

LLM Post-Training and Reasoning via Efficient Value-Based RL.

Upcoming Seminars

Understanding Scenes as 3D-Consistent Representations

Effective Game-Theoretic Motion Planning via Nested Search

Spiral Generator: Characterization and Modelling