TY - GEN
T1 - Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots
AU - Lee, Kyungjae
AU - Kim, Sungyub
AU - Lim, Sungbin
AU - Choi, Sungjoon
AU - Hong, Mineui
AU - Kim, Jaein
AU - Park, Yong Lae
AU - Oh, Songhwai
N1 - Funding Information:
Acknowledgements This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01190, [SW Star Lab] Robot Learning: Efficient, Safe, and Socially-Acceptable Machine Learning and No.2020-0-01336, Artificial Intelligence Graduate School Program (UNIST)), the National Research Foundation (NRF-2016R1A5A1938472) funded by the Korean Government (MSIT), and the 2020 Research Fund (1.200086.01, 1.200098.01) of UNIST(Ulsan National Institute of Science & Technology).
Publisher Copyright:
© 2020, MIT Press Journals. All rights reserved.
PY - 2020
Y1 - 2020
N2 - In this paper, we present a new class of entropy-regularized Markov decision processes (MDPs), which will be referred to as Tsallis MDPs. that inherently generalize well-known maximum entropy reinforcement learning (RL) by introducing an additional real-valued parameter called an entropic index. Our theoretical result enables us to derive and analyze different types of optimal policies with interesting properties relate to the stochasticity of the optimal policy by controlling the entropic index. To handle complex and model-free problems, such as learning a controller for a soft mobile robot, we propose a Tsallis actor-critic (TAC) method. We first observe that different RL problems have different desirable entropic indices where using proper entropic index results in superior performance compared to the state-of-the-art actor-critic methods. To mitigate the exhaustive search of the entropic index, we propose a quick-and-dirty curriculum method of gradually increasing the entropic index which will be referred to as TAC with Curricula (TAC2 ). TAC2 shows comparable performance to TAC with the optimal entropic index. Finally, We apply TAC2 to learn a controller of a soft mobile robot where TAC2 outperforms existing actor-critic methods in terms of both convergence speed and utility.
AB - In this paper, we present a new class of entropy-regularized Markov decision processes (MDPs), which will be referred to as Tsallis MDPs. that inherently generalize well-known maximum entropy reinforcement learning (RL) by introducing an additional real-valued parameter called an entropic index. Our theoretical result enables us to derive and analyze different types of optimal policies with interesting properties relate to the stochasticity of the optimal policy by controlling the entropic index. To handle complex and model-free problems, such as learning a controller for a soft mobile robot, we propose a Tsallis actor-critic (TAC) method. We first observe that different RL problems have different desirable entropic indices where using proper entropic index results in superior performance compared to the state-of-the-art actor-critic methods. To mitigate the exhaustive search of the entropic index, we propose a quick-and-dirty curriculum method of gradually increasing the entropic index which will be referred to as TAC with Curricula (TAC2 ). TAC2 shows comparable performance to TAC with the optimal entropic index. Finally, We apply TAC2 to learn a controller of a soft mobile robot where TAC2 outperforms existing actor-critic methods in terms of both convergence speed and utility.
UR - http://www.scopus.com/inward/record.url?scp=85116691499&partnerID=8YFLogxK
U2 - 10.15607/RSS.2020.XVI.036
DO - 10.15607/RSS.2020.XVI.036
M3 - Conference contribution
AN - SCOPUS:85116691499
SN - 9780992374761
T3 - Robotics: Science and Systems
BT - Robotics
A2 - Toussaint, Marc
A2 - Bicchi, Antonio
A2 - Hermans, Tucker
PB - MIT Press Journals
T2 - 16th Robotics: Science and Systems, RSS 2020
Y2 - 12 July 2020 through 16 July 2020
ER -