Seminar: Machine Learning Seminar

ECE Women Community

Facilitating Prediction-Powered Inference with the bootstrap

Date: March,12,2025 Start Time: 11:30 - 12:30
Location: 506, Zisapel Building
Add to:
Lecturer: Dan Kluger

Machine learning models are increasingly used to produce predictions that serve as input data in subsequent statistical analyses. For example, computer vision predictions of economic and environmental indicators based on satellite imagery are used in downstream regressions; similarly, language models are widely used to approximate human ratings and opinions in social science research. However, failure to properly account for errors in the machine learning predictions renders standard statistical procedures invalid. Prior work uses what we call the Predict-Then-Debias estimator to give valid confidence intervals when machine learning algorithms impute missing variables, assuming a small complete sample from the population of interest. We expand the scope by introducing bootstrap confidence intervals that apply when the complete data is a nonuniform (i.e., weighted, stratified, or clustered) sample and to settings where an arbitrary subset of features is imputed. Importantly, the method can be applied to many settings without requiring additional calculations. We prove that these confidence intervals are valid under no assumptions on the quality of the machine learning model and are no wider than the intervals obtained by methods that do not use machine learning predictions.

 

Dan is a Michael Hammer Postdoctoral Fellow at MIT in the Institute for Data Systems and Society, hosted by Stephen Bates and Sherrie Wang. As a statistician and interdisciplinary researcher, he is broadly interested in developing statistical methods for applications in agriculture and remote sensing. His current research is on methods for conducting reliable statistical analyses that leverage widely available, yet error-prone proxies. Dan recently completed his PhD in Statistics at Stanford University, where he was advised by Art Owen and David Lobell . While at Stanford, his research areas included multiple hypothesis testing, causal inference, measurement error, and crop rotation.

 

All Seminars
Skip to content