EM-LAST: Effective Multidimensional Latent Space Transport for an Unpaired Image-to-Image Translation with an Energy-Based Model

Giwoong Han, Jinhong Min, Sung Won Han

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

For an unpaired image-to-image translation to work effectively, the latent space of each image domain must be well-designed. The codes of each style must be translated toward the target while preserving the parts corresponding to the source content. In general, most Variational Autoencoder (VAE)-based models use a one-dimensional latent space. However, to apply high dimensional methodologies such as vector quantization, controlling a multidimensional latent space is necessary. In this study, among the VAE-based models that use relatively complex multidimensional latent spaces, we apply an Energy-Based Model and Vector-Quantized VAE v2, with the latter as the main model. We show that among the latent spaces that represent each image domain, the importance of each feature at the top and bottom latent spaces must be interpreted differently for appropriate translation. Therefore, we argue that simply understanding the features of latent space composition well can show effective image translation results. We also present various analyses and visual outcomes of multidimensional latent space transport.

Original languageEnglish
Pages (from-to)72839-72849
Number of pages11
JournalIEEE Access
Volume10
DOIs
Publication statusPublished - 2022

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • Energy-based model
  • Langevin dynamics
  • image-to-image translation
  • multidimensional latent space
  • vector-quantized variational autoencoder

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'EM-LAST: Effective Multidimensional Latent Space Transport for an Unpaired Image-to-Image Translation with an Energy-Based Model'. Together they form a unique fingerprint.

Cite this