Abstract
Fine-tuning text-to-image diffusion models to maximize rewards has proven effective for enhancing model performance. However, reward fine-tuning methods often suffer from slow convergence due to online sample generation. Therefore, obtaining diverse samples with strong reward signals is crucial for improving sample efficiency and overall performance. In this work, we introduce DiffExp, a simple yet effective exploration strategy for reward fine-tuning of text-to-image models. Our approach employs two key strategies: (a) dynamically adjusting the scale of classifier-free guidance to enhance sample diversity, and (b) randomly weighting phrases of the text prompt to exploit high-quality reward signals. We demonstrate that these strategies significantly enhance exploration during online sample generation, improving the sample efficiency of recent reward fine-tuning methods, such as DDPO and AlignProp.
| Original language | English |
|---|---|
| Pages (from-to) | 15696-15703 |
| Number of pages | 8 |
| Journal | Proceedings of the AAAI Conference on Artificial Intelligence |
| Volume | 39 |
| Issue number | 15 |
| DOIs | |
| Publication status | Published - 2025 Apr 11 |
| Event | 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States Duration: 2025 Feb 25 → 2025 Mar 4 |
Bibliographical note
Publisher Copyright:Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
ASJC Scopus subject areas
- Artificial Intelligence