pilco: a model-based and data-efficient approach to policy search
Author(s): Marc Peter Deisenroth, Carl Edward Rasmussen
Venue: International Conference of Machine Learning
Year Published: 2011
Keywords: state estimation, reinforcement learning, probabilistic models, gaussians, dynamical systems, visual perception, policy gradients
Expert Opinion: In principle, model based RL offers many advantages for robot learning, such as efficient use of data and the ability to predict in advance how a trajectory will roll out. In practice, however, getting model based RL to work has proved to be very difficult. In this work, the authors tackle a key difficulty F112 when optimizing a policy for a dynamics model that was learned from data, model errors get exploited by the optimization algorithm. A very elegant solution is proposed: uncertainty estimation should be incorporated into the decision making process, thereby discouraging the optimization to visit states where model uncertainty is high and the predictions are likely to be wrong. This intuitive idea is implemented using Gaussian processes, which offer a principled approach to modeling uncertainty in continuous dynamical systems. The resulting algorithm - PILCO - is demonstrated to be very efficient in sample complexity, improving upon the state of the art by orders of magnitude. This paper introduced several key ideas that have since been implemented in many subsequent works on robot learning and model based RL.