Trajectory-based probabilistic policy gradient for learning locomotion behaviors

Sungjoon Choi, Joohyung Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

In this paper, we propose a trajectory-based reinforcement learning method named deep latent policy gradient (DLPG) for learning locomotion skills. We define the policy function as a probability distribution over trajectories and train the policy using a deep latent variable model to achieve sample efficient skill learning. We first evaluate the sample efficiency of DLPG compared to the state-of-the-art reinforcement learning methods in simulated environments. Then, we apply the proposed method to a four-legged walking robot named Snapbot to learn three basic locomotion skills of turn left, go straight, and turn right. We demonstrate that, by properly designing two reward functions for curriculum learning, Snapbot successfully learns the desired locomotion skills with moderate sample complexity.

Original languageEnglish
Title of host publication2019 International Conference on Robotics and Automation, ICRA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-7
Number of pages7
ISBN (Electronic)9781538660263
DOIs
Publication statusPublished - 2019 May
Externally publishedYes
Event2019 International Conference on Robotics and Automation, ICRA 2019 - Montreal, Canada
Duration: 2019 May 202019 May 24

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2019 International Conference on Robotics and Automation, ICRA 2019
Country/TerritoryCanada
CityMontreal
Period19/5/2019/5/24

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Trajectory-based probabilistic policy gradient for learning locomotion behaviors'. Together they form a unique fingerprint.

Cite this