Abstract
In this paper, the audio-To-image generation problem is investigated, where appropriate images are generated from the audio input. A previous study, Cross-Modal Contrastive Representation Learning (CMCRL), trained using both audios and images to extract useful audio features for audio-To-image generation. The CMCRL upgraded the Generative Adversarial Networks (GAN) to achieve high performance in the generation learning phase, but the GAN showed training instability. In this paper, the C-SupConGAN that uses the conditional supervised contrastive loss (C-SupCon loss) is proposed. C-SupConGAN enhances the conditional contrastive loss (2C loss) of the Contrastive GAN (ContraGAN) that considers data-To-data relationships and data-To-class relationships in the discriminator. The audio and image embeddings extracted from the encoder pre-Trained using CMCRL is used to further extend the C-SupCon loss. The extended C-SupCon loss additionally considers relations information between data embedding and the corresponding audio embedding (data-To-source relationships) or between data embedding and the corresponding image embedding (data-To-Target relationships). Extensive experiments show that the proposed method improved performance, generates higher quality images for audio-To-image generation than previous research, and effectively alleviates the training collapse of GAN.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022 |
| Publisher | Association for Computing Machinery |
| Pages | 135-142 |
| Number of pages | 8 |
| ISBN (Electronic) | 9781450398749 |
| DOIs | |
| Publication status | Published - 2022 Dec 17 |
| Event | 5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022 - Osaka, Japan Duration: 2022 Dec 17 → 2022 Dec 19 |
Publication series
| Name | ACM International Conference Proceeding Series |
|---|
Conference
| Conference | 5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022 |
|---|---|
| Country/Territory | Japan |
| City | Osaka |
| Period | 22/12/17 → 22/12/19 |
Bibliographical note
Publisher Copyright:© 2022 ACM.
Keywords
- Audio-To-Image Generation
- Contrastive Learning
- Cross-Modal Generation
- Generative Adversarial Networks
ASJC Scopus subject areas
- Human-Computer Interaction
- Computer Networks and Communications
- Computer Vision and Pattern Recognition
- Software