Abstract
We propose a fully convolutional conditional generative neural network, the latent transformation neural network, capable of rigid and non-rigid object view synthesis using a lightweight architecture suited for real-time applications and embedded systems. In contrast to existing object view synthesis methods which incorporate conditioning information via concatenation, we introduce a dedicated network component, the conditional transformation unit. This unit is designed to learn the latent space transformations corresponding to specified target views. In addition, a consistency loss term is defined to guide the network toward learning the desired latent space mappings, a task-divided decoder is constructed to refine the quality of generated views of objects, and an adaptive discriminator is introduced to improve the adversarial training process. The generalizability of the proposed methodology is demonstrated on a collection of three diverse tasks: multi-view synthesis on real hand depth images, view synthesis of real and synthetic faces, and the rotation of rigid objects. The proposed model is shown to be comparable with the state-of-the-art methods in structural similarity index measure and L1 metrics while simultaneously achieving a 24% reduction in the compute time for inference of novel images.
Original language | English |
---|---|
Pages (from-to) | 1663-1677 |
Number of pages | 15 |
Journal | Visual Computer |
Volume | 36 |
Issue number | 8 |
DOIs | |
Publication status | Published - 2020 Aug 1 |
Externally published | Yes |
Bibliographical note
Funding Information:Karthik Ramani acknowledges the US National Science Foundation Awards NRI-1637961 and IIP-1632154. Guang Lin acknowledges the US National Science Foundation Awards DMS-1555072, DMS-1736364 and DMS-1821233. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agency. We gratefully appreciate the support of NVIDIA Corporation with the donation of GPUs used for this research.
Funding Information:
Karthik Ramani acknowledges the US National Science Foundation Awards NRI-1637961 and IIP-1632154. Guang Lin acknowledges the US National Science Foundation Awards DMS-1555072, DMS-1736364 and DMS-1821233. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agency. We gratefully appreciate the support of NVIDIA Corporation with the donation of GPUs used for this research. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Publisher Copyright:
© 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
Keywords
- Conditional generative model
- Fully convolutional
- Latent transformation
- Object view synthesis
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
- Computer Graphics and Computer-Aided Design