Abstract
In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. We provide the full mathematical analysis of the proposed framework. First, the optimal solution of an MCTE problem is shown to be a sparsemax distribution, whose supporting set can be adjusted. The proposed method has advantages over a softmax distribution in that it can exclude unnecessary actions by assigning zero probability. Second, we prove that an MCTE problem is equivalent to robust Bayes estimation in the sense of the Brier score. Third, we propose a maximum causal Tsallis entropy imitation learning (MCTEIL) algorithm with a sparse mixture density network (sparse MDN) by modeling mixture weights using a sparsemax distribution. In particular, we show that the causal Tsallis entropy of an MDN encourages exploration and efficient mixture utilization while Shannon entropy is less effective.
Original language | English |
---|---|
Pages (from-to) | 4403-4413 |
Number of pages | 11 |
Journal | Advances in Neural Information Processing Systems |
Volume | 2018-December |
Publication status | Published - 2018 |
Externally published | Yes |
Event | 32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada Duration: 2018 Dec 2 → 2018 Dec 8 |
Bibliographical note
Funding Information:This work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017R1A2B2006136) and by the Brain Korea 21 Plus Project in 2018.
Publisher Copyright:
© 2018 Curran Associates Inc..All rights reserved.
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing