Explaining generative diffusion models via visual analysis for interpretable decision-making process

Ji Hoon Park, Yeong Joon Ju, Seong Whan Lee

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Diffusion models have demonstrated remarkable performance in generation tasks. Nevertheless, explaining the diffusion process remains challenging due to it being a sequence of denoising noisy images that are difficult for experts to interpret. To address this issue, we propose the three research questions to interpret the diffusion process from the perspective of the visual concepts generated by the model and the region where the model attends in each time step. We devise tools for visualizing the diffusion process and answering the aforementioned research questions to render the diffusion process human-understandable. We show how the output is progressively generated in the diffusion process by explaining the level of denoising and highlighting relationships to foundational visual concepts at each time step through the results of experiments with various visual analyses using the tools. First, we rigorously examine spatial recovery levels to understand a model's focal region during denoising concerning semantic content and detailed levels. In doing so, we illustrate that the denoising model initiates image recovery from the region containing semantic information and progresses toward the area with finer-grained details. Secondly, we explore how specific concepts are highlighted at each denoising step by aligning generated images with the prompts used to produce them. By observing the internal flow of the diffusion process, we show how a model strategically predicts a particular visual concept at each denoising step to complete the final image. Finally, we extend our analysis to decode the visual concepts embedded in all the time steps of the process. Throughout the training of the diffusion model, the model learns diverse visual concepts corresponding to each time-step, enabling the model to predict varying levels of visual concepts at different stages. We substantiate our tools using Area Under Cover (AUC) score, correlation quantification, and cross-attention mapping. Our findings provide insights into the diffusion process and pave the way for further research into explainable diffusion mechanisms.

Original languageEnglish
Article number123231
JournalExpert Systems With Applications
Volume248
DOIs
Publication statusPublished - 2024 Aug 15

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

Keywords

  • Diffusion process
  • Explainable artificial intelligence
  • Generative neural networks
  • Saliency map

ASJC Scopus subject areas

  • General Engineering
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Explaining generative diffusion models via visual analysis for interpretable decision-making process'. Together they form a unique fingerprint.

Cite this