Abstract
In deep reinforcement learning, finding the optimal manipulation policy of a multi-DOF manipulator in 3D space requires intricate reward shaping for the agent to find the optimal policy. However, reward shaping requires cumbersome optimization of the reward function based on prior knowledge on robotic tasks to achieve. This makes it desirable to learn various manipulation policies with a simple reward function.In this study, we propose a method that learns the manipulation policy of a manipulator in a sparse reward setting. To this end, Hindsight Experience Replay (HER) is combined with Twin Delayed DDPG (TD3) by applying the goal strategy that incorporates demonstrations for the policy. It is shown that the policy can estimate the joint control command of a 7-DoF manipulator from raw RGB video inputs in sparse reward setting in an end-to-end manner.
Original language | English |
---|---|
Title of host publication | 2019 16th International Conference on Ubiquitous Robots, UR 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 159-164 |
Number of pages | 6 |
ISBN (Electronic) | 9781728132327 |
DOIs | |
Publication status | Published - 2019 Jun |
Event | 16th International Conference on Ubiquitous Robots, UR 2019 - Jeju, Korea, Republic of Duration: 2019 Jun 24 → 2019 Jun 27 |
Publication series
Name | 2019 16th International Conference on Ubiquitous Robots, UR 2019 |
---|
Conference
Conference | 16th International Conference on Ubiquitous Robots, UR 2019 |
---|---|
Country/Territory | Korea, Republic of |
City | Jeju |
Period | 19/6/24 → 19/6/27 |
Bibliographical note
Funding Information:This work was supported by IITP grant funded by the Korea Government MSIT. (No. 2018-0-00622)
Publisher Copyright:
© 2019 IEEE.
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Human-Computer Interaction
- Mechanical Engineering
- Control and Optimization