a reduction of imitation learning and structured prediction to no-regret online learning
Author(s): Stephane Ross, Geoffrey J. Gordon, J. Andrew Bagnell
Venue: 14th International Conference on Artificial Intelligence and Statistics
Year Published: 2011
Keywords: neural networks, learning from demonstration, dynamical systems
Expert Opinion: Imitation learning is a very appealing approach to learning robot skills. This paper shows that the straightforward technique of 'behavioral cloning' - simply copying the expert demonstrations, is actually not a good idea in sequential tasks. The reason is due to an effect of accumulating errors - once the learning agent strays away from states seen in the demonstration, it's learned policy is no longer accurate, causing it to stray even further away from the demonstration. There beauty of the paper is in capturing this idea mathematically, using no regret theoretical framework, and suggesting a simple algorithmic solution to the problem. The method, dubbed Dataset Aggregation (DAgger), asks for additional expert actions *on states visited by the policy*. The idea of controlling the distribution shift between the expert and the learner has since been fundamental to robotic imitation learning, and has manifested in various other methods.