Abstract
In this paper, we propose a trajectory-based reinforcement learning method named deep latent policy gradient (DLPG) for learning locomotion skills. We define the policy function as a probability distribution over trajectories and train the policy using a deep latent variable model to achieve sample efficient skill learning. We first evaluate the sample efficiency of DLPG compared to the state-of-the-art reinforcement learning methods in simulated environments. Then, we apply the proposed method to a four-legged walking robot named Snapbot to learn three basic locomotion skills of turn left, go straight, and turn right. We demonstrate that, by properly designing two reward functions for curriculum learning, Snapbot successfully learns the desired locomotion skills with moderate sample complexity.
Original language | English |
---|---|
Title of host publication | 2019 International Conference on Robotics and Automation, ICRA 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1-7 |
Number of pages | 7 |
ISBN (Electronic) | 9781538660263 |
DOIs | |
Publication status | Published - 2019 May |
Externally published | Yes |
Event | 2019 International Conference on Robotics and Automation, ICRA 2019 - Montreal, Canada Duration: 2019 May 20 → 2019 May 24 |
Publication series
Name | Proceedings - IEEE International Conference on Robotics and Automation |
---|---|
ISSN (Print) | 1050-4729 |
Conference
Conference | 2019 International Conference on Robotics and Automation, ICRA 2019 |
---|---|
Country/Territory | Canada |
City | Montreal |
Period | 19/5/20 → 19/5/24 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Electrical and Electronic Engineering
- Artificial Intelligence