Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis

Sang Hoon Lee, Hyun Wook Yoon, Hyeong Rae Noh, Ji Hoon Kim, Seong Whan Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    39 Citations (Scopus)

    Abstract

    While generative adversarial networks (GANs) based neural text-to-speech (TTS) systems have shown significant improvement in neural speech synthesis, there is no TTS system to learn to synthesize speech from text sequences with only adversarial feedback. Because adversarial feedback alone is not sufficient to train the generator, current models still require the reconstruction loss compared with the ground-truth and the generated mel-spectrogram directly. In this paper, we present Multi-SpectroGAN (MSG), which can train the multi-speaker model with only the adversarial feedback by conditioning a self-supervised hidden representation of the generator to a conditional discriminator. This leads to better guidance for generator training. Moreover, we also propose adversarial style combination (ASC) for better generalization in the unseen speaking style and transcript, which can learn latent representations of the combined style embedding from multiple mel-spectrograms. Trained with ASC and feature matching, the MSG synthesizes a high-diversity melspectrogram by controlling and mixing the individual speaking styles (e.g., duration, pitch, and energy). The result shows that the MSG synthesizes a high-fidelity mel-spectrogram, which has almost the same naturalness MOS score as the ground-truth mel-spectrogram.

    Original languageEnglish
    Title of host publication35th AAAI Conference on Artificial Intelligence, AAAI 2021
    PublisherAssociation for the Advancement of Artificial Intelligence
    Pages13198-13206
    Number of pages9
    ISBN (Electronic)9781713835974
    DOIs
    Publication statusPublished - 2021
    Event35th AAAI Conference on Artificial Intelligence, AAAI 2021 - Virtual, Online
    Duration: 2021 Feb 22021 Feb 9

    Publication series

    Name35th AAAI Conference on Artificial Intelligence, AAAI 2021
    Volume14B

    Conference

    Conference35th AAAI Conference on Artificial Intelligence, AAAI 2021
    CityVirtual, Online
    Period21/2/221/2/9

    Bibliographical note

    Publisher Copyright:
    © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

    ASJC Scopus subject areas

    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis'. Together they form a unique fingerprint.

    Cite this