Abstract
Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of Parrot over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment.
| Original language | English |
|---|---|
| Title of host publication | Computer Vision – ECCV 2024 - 18th European Conference, Proceedings |
| Editors | Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 462-478 |
| Number of pages | 17 |
| ISBN (Print) | 9783031729195 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy Duration: 2024 Sept 29 → 2024 Oct 4 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 15096 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 18th European Conference on Computer Vision, ECCV 2024 |
|---|---|
| Country/Territory | Italy |
| City | Milan |
| Period | 24/9/29 → 24/10/4 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science