policy gradient reinforcement learning for fast quadrupedal locomotion
Author(s): Nate Kohl, Peter Stone
Venue: IEEE International Conference on Robotics and Automation (ICRA)
Year Published: 2004
Keywords: reinforcement learning, policy gradients, locomotion, legged robots
Expert Opinion: The work is practical in that it allowed the authors to improve the walking speed of Aibos, something essential to creating top-flight robocup players. The reason I adore this work and frequently cite it in my talks on machine learning is the fantastic way it allowed the robots to learn autonomously. In particular, for the Aibo robots to succeed in robocup, they need to be able to localize on the field based on their perception of provided markers. The authors enabled the robots to measure their own walking speed leveraging this capability. By marching a team of robots back and forth across the width of the pitch, experimenting with and evaluating different gaits each time, the robots were able to find movement patterns that surpassed hand-designed ones. It's a beautiful example of exploiting measurable quantities to drive learning---a key enabling technology for robot learning.