TY - GEN
T1 - Tidy-Up Tasks Using Trajectory-based Imitation Learning
AU - Kim, Doo Jun
AU - Jo, Hyun Jun
AU - Song, Jae Bok
N1 - Funding Information:
This work was supported by IITP grant funded by the Korea Government MSIT. (No. 2018-0-00622)
Publisher Copyright:
© 2021 ICROS.
PY - 2021
Y1 - 2021
N2 - When performing reinforcement learning using a robot arm in the real environment, it is important to perform reinforcement learning safely and quickly. This is because unexpected behaviors during reinforcement learning and long-term learning can damage the robot arm or surrounding objects. In this study, trajectory-based imitation learning that suppresses unexpected situations and quickly learns the policies suitable for the robots is proposed by limiting the workspace to be explored through one human demonstration. Trajectory-based imitation learning consists of two stages. First, a reference trajectory is generated considering the position of a target object and the expert trajectory from the human demonstration. Second, the target task is trained by performing reinforcement learning based on the generated reference trajectory. Experiments were conducted in simulation and real environments to verify the proposed imitation learning algorithm. In the simulation, a laptop folding task was performed with a success rate of 97% to verify the performance of the algorithm. In addition, it was shown that safe and fast learning is possible with only one demonstration video on the drawer arrangement in a real environment.
AB - When performing reinforcement learning using a robot arm in the real environment, it is important to perform reinforcement learning safely and quickly. This is because unexpected behaviors during reinforcement learning and long-term learning can damage the robot arm or surrounding objects. In this study, trajectory-based imitation learning that suppresses unexpected situations and quickly learns the policies suitable for the robots is proposed by limiting the workspace to be explored through one human demonstration. Trajectory-based imitation learning consists of two stages. First, a reference trajectory is generated considering the position of a target object and the expert trajectory from the human demonstration. Second, the target task is trained by performing reinforcement learning based on the generated reference trajectory. Experiments were conducted in simulation and real environments to verify the proposed imitation learning algorithm. In the simulation, a laptop folding task was performed with a success rate of 97% to verify the performance of the algorithm. In addition, it was shown that safe and fast learning is possible with only one demonstration video on the drawer arrangement in a real environment.
KW - Human demonstration
KW - Manipulation
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85124233599&partnerID=8YFLogxK
U2 - 10.23919/ICCAS52745.2021.9649826
DO - 10.23919/ICCAS52745.2021.9649826
M3 - Conference contribution
AN - SCOPUS:85124233599
T3 - International Conference on Control, Automation and Systems
SP - 496
EP - 499
BT - 2021 21st International Conference on Control, Automation and Systems, ICCAS 2021
PB - IEEE Computer Society
T2 - 21st International Conference on Control, Automation and Systems, ICCAS 2021
Y2 - 12 October 2021 through 15 October 2021
ER -