Abstract
For an unpaired image-to-image translation to work effectively, the latent space of each image domain must be well-designed. The codes of each style must be translated toward the target while preserving the parts corresponding to the source content. In general, most Variational Autoencoder (VAE)-based models use a one-dimensional latent space. However, to apply high dimensional methodologies such as vector quantization, controlling a multidimensional latent space is necessary. In this study, among the VAE-based models that use relatively complex multidimensional latent spaces, we apply an Energy-Based Model and Vector-Quantized VAE v2, with the latter as the main model. We show that among the latent spaces that represent each image domain, the importance of each feature at the top and bottom latent spaces must be interpreted differently for appropriate translation. Therefore, we argue that simply understanding the features of latent space composition well can show effective image translation results. We also present various analyses and visual outcomes of multidimensional latent space transport.
Original language | English |
---|---|
Pages (from-to) | 72839-72849 |
Number of pages | 11 |
Journal | IEEE Access |
Volume | 10 |
DOIs | |
Publication status | Published - 2022 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- Energy-based model
- Langevin dynamics
- image-to-image translation
- multidimensional latent space
- vector-quantized variational autoencoder
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering
- Electrical and Electronic Engineering