Abstract
Generating novel views of an object from a single image is a challenging task. It requires an understanding of the underlying 3D structure of the object from an image and ren-dering high-quality, spatially consistent new views. While recent methods for view synthesis based on diffusion have shown great progress, achieving consistency among various view estimates and at the same time abiding by the desired camera pose remains a critical problem yet to be solved. In this work, we demonstrate a strikingly simple method, where we utilize a pre-trained video diffusion model to solve this problem. Our key idea is that synthesizing a novel view could be reformulated as synthesizing a video of a cam-era going around the object of interest-a scanning video-which then allows us to leverage the powerful priors that a video diffusion model would have learned. Thus, to perform novel-view synthesis, we create a smooth camera trajectory to the target view that we wish to render, and denoise using both a view-conditioned diffusion model and a video diffusion model. By doing so, we obtain a highly consistent novel view synthesis, outperforming the state of the art.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 |
| Publisher | IEEE Computer Society |
| Pages | 6775-6785 |
| Number of pages | 11 |
| ISBN (Electronic) | 9798350353006 |
| ISBN (Print) | 9798350353006 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: 2024 Jun 16 → 2024 Jun 22 |
Publication series
| Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
|---|---|
| ISSN (Print) | 1063-6919 |
Conference
| Conference | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 |
|---|---|
| Country/Territory | United States |
| City | Seattle |
| Period | 24/6/16 → 24/6/22 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- Diffusion Model
- Novel View Synthesis
- Video Diffusion Model
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
Fingerprint
Dive into the research topics of 'ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS