ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

  • Jeong Gi Kwak
  • , Erqun Dong
  • , Yuhe Jin
  • , Hanseok Ko
  • , Shweta Mahajan
  • , Kwang Moo Yi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Generating novel views of an object from a single image is a challenging task. It requires an understanding of the underlying 3D structure of the object from an image and ren-dering high-quality, spatially consistent new views. While recent methods for view synthesis based on diffusion have shown great progress, achieving consistency among various view estimates and at the same time abiding by the desired camera pose remains a critical problem yet to be solved. In this work, we demonstrate a strikingly simple method, where we utilize a pre-trained video diffusion model to solve this problem. Our key idea is that synthesizing a novel view could be reformulated as synthesizing a video of a cam-era going around the object of interest-a scanning video-which then allows us to leverage the powerful priors that a video diffusion model would have learned. Thus, to perform novel-view synthesis, we create a smooth camera trajectory to the target view that we wish to render, and denoise using both a view-conditioned diffusion model and a video diffusion model. By doing so, we obtain a highly consistent novel view synthesis, outperforming the state of the art.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PublisherIEEE Computer Society
Pages6775-6785
Number of pages11
ISBN (Electronic)9798350353006
ISBN (Print)9798350353006
DOIs
Publication statusPublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: 2024 Jun 162024 Jun 22

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/TerritoryUnited States
CitySeattle
Period24/6/1624/6/22

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • Diffusion Model
  • Novel View Synthesis
  • Video Diffusion Model

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models'. Together they form a unique fingerprint.

Cite this