Abstract
In this paper, the audio-To-image generation problem is investigated, where appropriate images are generated from the audio input. A previous study, Cross-Modal Contrastive Representation Learning (CMCRL), trained using both audios and images to extract useful audio features for audio-To-image generation. The CMCRL upgraded the Generative Adversarial Networks (GAN) to achieve high performance in the generation learning phase, but the GAN showed training instability. In this paper, the C-SupConGAN that uses the conditional supervised contrastive loss (C-SupCon loss) is proposed. C-SupConGAN enhances the conditional contrastive loss (2C loss) of the Contrastive GAN (ContraGAN) that considers data-To-data relationships and data-To-class relationships in the discriminator. The audio and image embeddings extracted from the encoder pre-Trained using CMCRL is used to further extend the C-SupCon loss. The extended C-SupCon loss additionally considers relations information between data embedding and the corresponding audio embedding (data-To-source relationships) or between data embedding and the corresponding image embedding (data-To-Target relationships). Extensive experiments show that the proposed method improved performance, generates higher quality images for audio-To-image generation than previous research, and effectively alleviates the training collapse of GAN.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022 |
Publisher | Association for Computing Machinery |
Pages | 135-142 |
Number of pages | 8 |
ISBN (Electronic) | 9781450398749 |
DOIs | |
Publication status | Published - 2022 Dec 17 |
Event | 5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022 - Osaka, Japan Duration: 2022 Dec 17 → 2022 Dec 19 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022 |
---|---|
Country/Territory | Japan |
City | Osaka |
Period | 22/12/17 → 22/12/19 |
Bibliographical note
Publisher Copyright:© 2022 ACM.
Keywords
- Audio-To-Image Generation
- Contrastive Learning
- Cross-Modal Generation
- Generative Adversarial Networks
ASJC Scopus subject areas
- Human-Computer Interaction
- Computer Networks and Communications
- Computer Vision and Pattern Recognition
- Software