TY - JOUR
T1 - Wasserstein training of restricted boltzmann machines
AU - Montavon, Grégoire
AU - Müller, Klaus Robert
AU - Cuturi, Marco
N1 - Funding Information:
This work was supported by the Brain Korea 21 Plus Program through the National Research Foundation of Korea funded by the Ministry of Education. This work was also supported by the grant DFG (MU 987/17-1). M. Cuturi gratefully acknowledges the support of JSPS young researcher A grant 26700002.
Publisher Copyright:
© 2016 NIPS Foundation - All Rights Reserved.
PY - 2016
Y1 - 2016
N2 - Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.
AB - Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.
UR - http://www.scopus.com/inward/record.url?scp=85019180324&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85019180324
SN - 1049-5258
SP - 3718
EP - 3726
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 30th Annual Conference on Neural Information Processing Systems, NIPS 2016
Y2 - 5 December 2016 through 10 December 2016
ER -