Maximum causal tsallis entropy imitation learning

Kyungjae Lee, Sungjoon Choi, Songhwai Oh

Research output: Contribution to journalConference articlepeer-review

16 Citations (Scopus)

Abstract

In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. We provide the full mathematical analysis of the proposed framework. First, the optimal solution of an MCTE problem is shown to be a sparsemax distribution, whose supporting set can be adjusted. The proposed method has advantages over a softmax distribution in that it can exclude unnecessary actions by assigning zero probability. Second, we prove that an MCTE problem is equivalent to robust Bayes estimation in the sense of the Brier score. Third, we propose a maximum causal Tsallis entropy imitation learning (MCTEIL) algorithm with a sparse mixture density network (sparse MDN) by modeling mixture weights using a sparsemax distribution. In particular, we show that the causal Tsallis entropy of an MDN encourages exploration and efficient mixture utilization while Shannon entropy is less effective.

Original languageEnglish
Pages (from-to)4403-4413
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2018-December
Publication statusPublished - 2018
Externally publishedYes
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: 2018 Dec 22018 Dec 8

Bibliographical note

Funding Information:
This work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017R1A2B2006136) and by the Brain Korea 21 Plus Project in 2018.

Publisher Copyright:
© 2018 Curran Associates Inc..All rights reserved.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Maximum causal tsallis entropy imitation learning'. Together they form a unique fingerprint.

Cite this