C-SupConGAN: Using Contrastive Learning and Trained Data Features for Audio-To-Image Generation

Haechun Chung, Jong Kook Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, the audio-To-image generation problem is investigated, where appropriate images are generated from the audio input. A previous study, Cross-Modal Contrastive Representation Learning (CMCRL), trained using both audios and images to extract useful audio features for audio-To-image generation. The CMCRL upgraded the Generative Adversarial Networks (GAN) to achieve high performance in the generation learning phase, but the GAN showed training instability. In this paper, the C-SupConGAN that uses the conditional supervised contrastive loss (C-SupCon loss) is proposed. C-SupConGAN enhances the conditional contrastive loss (2C loss) of the Contrastive GAN (ContraGAN) that considers data-To-data relationships and data-To-class relationships in the discriminator. The audio and image embeddings extracted from the encoder pre-Trained using CMCRL is used to further extend the C-SupCon loss. The extended C-SupCon loss additionally considers relations information between data embedding and the corresponding audio embedding (data-To-source relationships) or between data embedding and the corresponding image embedding (data-To-Target relationships). Extensive experiments show that the proposed method improved performance, generates higher quality images for audio-To-image generation than previous research, and effectively alleviates the training collapse of GAN.

Original languageEnglish
Title of host publicationProceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022
PublisherAssociation for Computing Machinery
Pages135-142
Number of pages8
ISBN (Electronic)9781450398749
DOIs
Publication statusPublished - 2022 Dec 17
Event5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022 - Osaka, Japan
Duration: 2022 Dec 172022 Dec 19

Publication series

NameACM International Conference Proceeding Series

Conference

Conference5th Artificial Intelligence and Cloud Computing Conference, AICCC 2022
Country/TerritoryJapan
CityOsaka
Period22/12/1722/12/19

Bibliographical note

Publisher Copyright:
© 2022 ACM.

Keywords

  • Audio-To-Image Generation
  • Contrastive Learning
  • Cross-Modal Generation
  • Generative Adversarial Networks

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of 'C-SupConGAN: Using Contrastive Learning and Trained Data Features for Audio-To-Image Generation'. Together they form a unique fingerprint.

Cite this