Wasserstein training of restricted boltzmann machines

Grégoire Montavon, Klaus Robert Müller, Marco Cuturi

Research output: Contribution to journalConference articlepeer-review

57 Citations (Scopus)

Abstract

Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.

Original languageEnglish
Pages (from-to)3718-3726
Number of pages9
JournalAdvances in Neural Information Processing Systems
Publication statusPublished - 2016
Event30th Annual Conference on Neural Information Processing Systems, NIPS 2016 - Barcelona, Spain
Duration: 2016 Dec 52016 Dec 10

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this