end-to-end training of deep visuomotor policies

Author(s): Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel
Venue: Journal of Machine Learning Research
Year Published: 2016
Keywords: manipulation, probabilistic models, planning, locomotion, learning from demonstration, reinforcement learning, neural networks, visual perception
Expert Opinion: It introduced end-to-end training with impressive results going from pixels to torques on several interesting tasks.

autonomous helicopter aerobatics through apprenticeship learning

Author(s): Pieter Abbeel, Adam Coates and Andrew Y. Ng
Venue: International Journal of Robotics Research
Year Published: 2010
Keywords: learning from demonstration, optimal control, dynamical systems
Expert Opinion: This paper presents a beautiful and compelling demonstration of the strength of learning dynamical models and using optimal control to learn complex tasks on intrinsically unstable systems even if the learned models rather crude and the optimal controllers are based on linearization, both strong approximations of reality. Furthermore, it addresses the problem of learning from demonstrations and improving from such demonstrations to beat human performance. To the best of my knowledge, on of the first paper demonstrating the use of learning by demonstration, model learning and optimal control together to achieve acrobatic tasks.

pilco: a model-based and data-efficient approach to policy search

Author(s): Marc Peter Deisenroth, Carl Edward Rasmussen
Venue: International Conference of Machine Learning
Year Published: 2011
Keywords: state estimation, reinforcement learning, probabilistic models, gaussians, dynamical systems, visual perception, policy gradients
Expert Opinion: PILCO is an extremely data-efficient model-based approach for learning policies. At the core of the approach is the Gaussian process transition model. This nonparametric Bayesian representation allows the robot to capture its uncertainty of the state transitions and thus reason about a distribution of potential models given the robot's past experiences. This approach allows the robot to compute analytical gradients for updating its polices and results in highly data efficient learning. PILCO is a great example for highlighting the benefits of using model-based approaches and capturing uncertainty. PILCO has been used for a number of robotics projects over the years, including the control of a low-cost robotic arm for stacking blocks using visual feedback.

alvinn: an autonomous land vehicle in a neural network

Author(s): Dean A. Pomerleau
Venue: MITP
Year Published: 1989
Keywords: mobile robots, learning from demonstration, neural networks
Expert Opinion: On the theoretical side, the first paper to recognize covariate shift in imitation learning and provide a simple data-augmentation style strategy to improve it. On the implementation side, a real self-driving first that led to "No Hands Across America".

a survey on policy search for robotics

Author(s): Marc Peter Deisenroth, Gerhard Neumann, Jan Peters
Venue: Book
Year Published: 2013
Keywords: survey, reinforcement learning
Expert Opinion: A great unifying view on policy search

robotic grasping of novel objects using vision

Author(s): Ashutosh Saxena, Justin Driemeyer, Andrew Y. Ng
Venue: International Journal of Robotics Research
Year Published: 2008
Keywords: neural networks, dynamical systems, visual perception, learning from demonstration, manipulation, planning
Expert Opinion: This is one of the first works in literature that utilized machine learning for the robotic manipulation problem. The proposed framework is still useful to design similar robot learning solutions. The particular importance of this work is to identify local features that are related to manipulation planning

dynamical movement primitives: learning attractor models for motor behaviors

Author(s): Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, Stefan Schaal
Venue: Neural Computation (Volume 25, Issue 2)
Year Published: 2013
Keywords: planning, learning from demonstration, dynamical systems, nonlinear systems
Expert Opinion: The right parametrization is often the key in a learning system. Dynamical movement primitives (Ijspeert, Nakanishi, Schaal, 2003) are a very successful way to encode movements in robots. The idea is to use dynamical systems with desired properties, such as stable attractors or rhythmic solutions, as building blocks. This provides a low-dimensional parametrization and combining them linearly allows for effective learning. So far it was mainly used for learning from demonstration.

supersizing self-supervision: learning to grasp from 50k tries and 700 robot hours

Author(s): Lerrel Pinto, Abhinav Gupta
Venue: IEEE International Conference on Robotics and Automation (ICRA)
Year Published: 2015
Keywords: manipulation, reinforcement learning, neural networks
Expert Opinion: Pinto et al., were the first paper to exploit deep learning techniques to process large amounts of data collected by a robot running 24x7 for significantly improving the grasping accuracy without making any object specific assumptions or requiring 3D models of objects. This paper inspired several works in using large scale data to learn intuitive physics, manipulation of deformable objects and also impressive grasping works such as Google's arm farm and DexNet.

a reduction of imitation learning and structured prediction to no-regret online learning

Author(s): Stephane Ross, Geoffrey J. Gordon, J. Andrew Bagnell
Venue: 14th International Conference on Artificial Intelligence and Statistics
Year Published: 2011
Keywords: neural networks, learning from demonstration, dynamical systems
Expert Opinion: This paper provides the first formal analysis of the (dynamic) covariate shift problem, where the suboptimal execution behavior of a policy drives the system to different states than those observed during training. While the general problem itself was well-known at the time ("Behavioral Cloning: A Correction" Michie 1995; Alvinn: "An autonomous land vehicle in a neural network" Pomerleau 1989), a disciplined analysis was lacking in the community. Ross et al. use a regret analysis to analyze and theoretically control the effects of dynamic covariate shift. The theory and algorithmic tools proposed in this work are still an active area of research today.

probabilistic robotics

Author(s): Sebastian Thrun, Wolfram Burgard, Dieter Fox
Venue: Book
Year Published: 2005
Keywords: probabilistic models
Expert Opinion: It laid out basis for robotics in uncertain real world.

policy gradient reinforcement learning for fast quadrupedal locomotion

Author(s): Nate Kohl, Peter Stone
Venue: IEEE International Conference on Robotics and Automation (ICRA)
Year Published: 2004
Keywords: reinforcement learning, policy gradients, locomotion, legged robots
Expert Opinion: The paper is one of the first impressive applications of policy gradient algorithms on real robots. The policy gradient algorithm is rather simple, but is able to optimize the gait of the AIBO robot efficiently.

hindsight experience replay

Author(s): Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba
Venue: Neural Information Processing Systems Conference (NeurIPS)
Year Published: 2018
Keywords: manipulation, humanoid robotics, reinforcement learning, neural networks
Expert Opinion: A really nice, simple idea for learning parameterized skills (building on UVFAs) and efficiently dealing with sparse reward. I think Learning Parameterized Motor Skills on a Humanoid Robot (Castro Da Silva et. al) has a much better description of the parameterized skill learning problem than the HER or UVFA papers, but the HER paper has better practical ideas.

movement imitation with nonlinear dynamical systems in humanoid robots

Author(s): Auke Jan Ijspeert, Jun Nakanishi, Stefan Schaal
Venue: IEEE International Conference on Robotics and Automation (ICRA)
Year Published: 2002
Keywords: probabilistic models, nonlinear systems, dynamical systems, learning from demonstration, humanoid robotics
Expert Opinion: First work that proproses practical movement primitive representation for robotics. Very concise paper: shows how much can be packed into 6 pages.

reinforcement learning: an introduction

Author(s): Richard S. Sutton and Andrew G. Barto
Venue: Book
Year Published: 2018
Keywords: mobile robots, reinforcement learning, unsupervised learning, optimal control, genetic algorithms
Expert Opinion: Somewhat repeating myself from the last suggestion: for learning robot behavior, reinforcement learning is an essential tool. While Sutton & Barto do not focus specifically on the case of robotics, their book is a very accessible text that nevertheless manages to cover many aspects, techniques, and challenges in reinforcement learning.

from skills to symbols: learning symbolic representations for abstract high-level planning

Author(s): George Konidaris, Leslie Pack Kaelbling, Tomas Lozano-Perez
Venue: Journal of Artificial Intelligence Research
Year Published: 2018
Keywords: probabilistic models, planning
Expert Opinion: As we get better at low-level robotic control, the community will need to start thinking more about longer-horizon problems and how to smoothly flow between reasoning at different levels of abstraction. This paper presents a theoretically-ground formal treatment of the problem, proves some nice stuff about what constitutes necessary and sufficient symbols for various types of planning, and shows some nice demos on a real robot. It is by far the best analysis of hierarchical learning / planning that I know of and provides a much-needed theoretical foundation for moving this area of research forward.

probabilistic movement primitives

Author(s): Alexandros Paraschos, Christian Daniel, Jan Peters, and Gerhard Neumann
Venue: Neural Information Processing Systems Conference (NeurIPS)
Year Published: 2013
Keywords: manipulation, probabilistic models, gaussians, planning, learning from demonstration
Expert Opinion: This and the following papers using ProMPs, because they provided a very nice formulation for representing probabilistic movement primitives. ProMPs have many advantages and I found them better than classical DMPs in many robotics applications, from gestures to whole-body manipulations.

intrinsic motivation systems for autonomous mental development

Author(s): Pierre-Yves Oudeyer, Frederic Kaplan, and Verena V. Hafner
Venue: IEEE Transactions on Evolutionary Computation (Volume 11, Issue 2)
Year Published: 2007
Keywords: reinforcement learning, evolution, neural networks
Expert Opinion: This work contributes to the general question of obtaining life-long learning robotic systems. Large body of the existing robot learning literature mostly focus on methods that enable the robots to learn particular pre-defined skills and achieve particular tasks. Life-long learning, on the other hand, requires the robots to learn skills and adapt to situations that were not (and cannot be) foreseen. Inspired from human development, intrinsic motivation is an important drive that guides the robots towards regions that can be most effectively and efficiently learned with the capabilities developed so far; exploiting metrics such as novelty, curiosity, diversity, etc. This paper, in particular, is a seminal study that exploits maximization of learning progress in a real robot that explores its continuous sensorimotor space. It nicely shows that the robot exhibits stage-like development, learning easy tasks first, and focusing to more complex problems later; progressively developing more advanced skills.

learning and generalization of motor skills by learning from demonstration

Author(s): Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal
Venue: IEEE International Conference on Robotics and Automation (ICRA)
Year Published: 2009
Keywords: planning, learning from demonstration
Expert Opinion: Not the first DMP paper, but the most understandable and with fixes to some annoying problems with the original formulation. Incredibly simple idea, but that's the nice thing about it -- it is a great starting point for talking about what generalization means in policy learning and how a restricted policy representation with the right inductive bias can allow you to learn something meaningful from a single trajectory, as well as learn quickly from practice.

maximum entropy inverse reinforcement learning

Author(s): Brian D. Ziebart, Andrew Maas, J.Andrew Bagnell, and Anind K. Dey
Venue: AAAI Conference on Artificial Intelligence
Year Published: 2008
Keywords: probabilistic models, learning from demonstration, reinforcement learning
Expert Opinion: This work is one of the first to connect probabilistic inference with robot policy learning. Maximum Entropy Inverse Reinforcement Learning poses the classical Inverse Reinforcement Learning problem, well-studied for several years before this work, as maximizing the likelihood of observing a state distributing given a noisily optimal agent w.r.t an unknown reward function. The inference method, model, and general principles not only inspired future IRL works (such as RelEnt-IRL, GP-IRL, and Guided Cost Learning), they also have been applied in Human Robot Interaction and general policy search algorithms.