trust region policy optimization
Author(s): John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
Venue: International Conference on Machine Learning
Year Published: 2017
Keywords: policy gradients, reinforcement learning
Expert Opinion: The TRPO paper lead the way towards practical Reinforcement Learning (RL) for robotics (and other domains). It's much more sample efficient and robust than previous approaches, and scales to high dimensional continuous action spaces. It's become a very popular method for training RL algorithms applied to robotics, although its slowly being replaced with descendants like Proximal Policy Optimization (PPO). TRPO has played a big part in reviving RL for robotics.