Bounding catastrophic forgetting in linear regression
To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions.
We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks, obtaining exact expressions and bounds.
We establish connections between continual learning in the linear setting and two other research areas — alternating projections and the Kaczmarz method.
In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas.
Under a cyclic task ordering, we derive a dimensionality-independent universal bound for the worst case forgetting. This stands in contrast to the convergence to the offline solution, which can be arbitrarily slow according to existing alternating projection results. We further show that a better bound can be achieved when tasks are presented in a random ordering.
Evron, Moroshko, Ward, Srebro, and Soudry. How catastrophic can catastrophic forgetting to be in linear regression. COLT 2022
* Itay Evron is a Ph.D student at the Electrical Engineering Department at the Technion under the supervision of Professor Daniel Soudry.