COMPOSE AND CONQUER: DIFFUSION-BASED 3D DEPTH AWARE COMPOSABLE IMAGE SYNTHESIS

  • Jonghyun Lee
  • , Hansam Cho
  • , Youngjoon Yoo
  • , Seoung Bum Kim*
  • , Yonghyun Jeong*
  • *Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

Addressing the limitations of text as a source of accurate layout representation in text-conditional diffusion models, many works incorporate additional signals to condition certain attributes within a generated image. Although successful, previous works do not account for the specific localization of said attributes extended into the three dimensional plane. In this context, we present a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. Specifically, we first introduce depth disentanglement training to leverage the relative depth of objects as an estimator, allowing the model to identify the absolute positions of unseen objects through the use of synthetic image triplets. We also introduce soft guidance, a method for imposing global semantics onto targeted regions without the use of any additional localization cues. Our integrated framework, COMPOSE AND CONQUER (CNC), unifies these techniques to localize multiple conditions in a disentangled manner. We demonstrate that our approach allows perception of objects at varying depths while offering a versatile framework for composing localized objects with different global semantics.

Original languageEnglish
Publication statusPublished - 2024
Event12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria
Duration: 2024 May 72024 May 11

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityHybrid, Vienna
Period24/5/724/5/11

Bibliographical note

Publisher Copyright:
© 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'COMPOSE AND CONQUER: DIFFUSION-BASED 3D DEPTH AWARE COMPOSABLE IMAGE SYNTHESIS'. Together they form a unique fingerprint.

Cite this