סמינר: Graduate Seminar
Out of distribution generalization in decision making
Recent advancements in Deep Learning allow AI models to surpass humans in numerous perception tasks. However, what determines their real-world success is the ability to generalize to unseen, out-of-distribution (OOD) test tasks.
In this talk, we cover several works. First, how to detect OOD samples, and then, what decisions to make when facing OOD test tasks in both reinforcement learning and supervised (imitation learning) algorithms. The first work introduces the residual flow (CVPR`19), a simple flow architecture that accurately learns the distribution of feature activations in the training data. Using this distribution, we can detect when a test sample is "too different" from the training data, i.e, suspected as an OOD sample, which may lead to poor prediction results. For OOD detection in image datasets, residual flow provides a principled improvement over contemporary models. The second work (NeurIPS`23) studies zero-shot generalization in reinforcement learning. We present the Explore to Generalize (ExpGen) algorithm, which builds on the insight that learning a policy for effective exploration of the domain is harder to memorize than a policy that maximizes reward, and therefore generalizes better to OOD tasks. Finally, we expand upon this idea toward generalization in Behavioral Cloning (BC). We show that hiding some of the task information from the human demonstrator, i.e., “blindfolding” the expert, compels the expert to employ non-trivial exploration in order to solve the task. The resulting cloned policy exhibits better generalization, in both theory and in practice, on a challenging videogame and real-world robotic peg insertion tasks. |
PhD. student under the supervision of Prof. Aviv Tamar.
|