Seminar: Signal Processing and Systems

On Principal Component Regression in High Dimension

Date: January,26,2025 Start Time: 13:30 - 14:30
Location: 1061, Meyer Building
Add to:
Lecturer: Dr. Elad Romanov
Principal component regression (PCR) is a classical two-step approach to linear regression, where one first reduces the data dimension by projecting onto its leading principal components, and then performs ordinary least squares regression. We study PCR in an asymptotic high-dimensional regression setting, where the number of data points is proportional to the dimension. Our main deliverables are asymptotically exact limiting formulas for the estimation and prediction risks, which depend in a nuanced way on the eigenvalues of the population covariance, the alignment between the population principal components and the true signal, and the number of selected components.

A key challenge in the high-dimensional regime is that the sample covariance matrix is an inconsistent estimate of
its population counterpart, and thus sample principal components may fail to capture potential latent low-dimensional structure in the data. We demonstrate this point through several case studies, including that of a spiked covariance matrix.

The analysis of (random design) linear regression in high dimension typically builds on powerful results from random matrix theory, such as the Marchenko–Pastur law and deterministic equivalents for the resolvent of a sample covariance matrix. However, these standard tools alone are not sufficient for analyzing the prediction risk of PCR. To that end, we leverage and develop somewhat less standard techniques, which, to our knowledge, have not seen wide use in the statistics literature to date: multi-resolvent traces and their associated eigenvector overlap measures.

Based on joint work with Alden Green (Stanford).

All Seminars
Skip to content