Seminar: Graduate Seminar
RoBOReL: Bayes Optimal Offline Meta Reinforcement Learning for Robotic Manipulation
Meta-reinforcement learning (meta-RL) agents are trained on a distribution of tasks to learn exploration strategies that help them identify the current environment, enabling rapid adaptation to new tasks. The quickest adaptation is achievable by Bayes-optimal exploration, and prior work has demonstrated high-performing approximations for it in simulation. However, prior approaches are unsuitable for learning on real robots due to the costly and possibly risky online process of learning through interactions. Offline meta-RL (OMRL) presents a possible solution to this problem by learning the meta-RL policy from offline datasets. In this work, we present the first (approximately) Bayes-optimal OMRL method implemented on a challenging real-robot peg insertion task. We focus on real-robot implementation and design novel data collection and offline policy evaluation techniques. Importantly, our learned policy adapts much faster than previous methods, solving new tasks in only two attempts. These results indicate that Bayes-optimal OMRL is a promising direction for very fast task adaptation in robotics.
M.Sc. student under the supervision of Prof. Aviv Tamar.