DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models

Daewon Chae, June Suk Choi, Jinkyu Kim, Kimin Lee

Research output: Contribution to journalConference articlepeer-review

Abstract

Fine-tuning text-to-image diffusion models to maximize rewards has proven effective for enhancing model performance. However, reward fine-tuning methods often suffer from slow convergence due to online sample generation. Therefore, obtaining diverse samples with strong reward signals is crucial for improving sample efficiency and overall performance. In this work, we introduce DiffExp, a simple yet effective exploration strategy for reward fine-tuning of text-to-image models. Our approach employs two key strategies: (a) dynamically adjusting the scale of classifier-free guidance to enhance sample diversity, and (b) randomly weighting phrases of the text prompt to exploit high-quality reward signals. We demonstrate that these strategies significantly enhance exploration during online sample generation, improving the sample efficiency of recent reward fine-tuning methods, such as DDPO and AlignProp.

Original languageEnglish
Pages (from-to)15696-15703
Number of pages8
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number15
DOIs
Publication statusPublished - 2025 Apr 11
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 2025 Feb 252025 Mar 4

Bibliographical note

Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models'. Together they form a unique fingerprint.

Cite this