VoiceMixer: Adversarial Voice Style Mixup

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Although recent advances in voice conversion have shown significant improvement, there still remains a gap between the converted voice and target voice. A key factor that maintains this gap is the insufficient decomposition of content and voice style from the source speech. This insufficiency leads to the converted speech containing source speech style or losing source speech content. In this paper, we present VoiceMixer which can effectively decompose and transfer voice style through a novel information bottleneck and adversarial feedback. With self-supervised representation learning, the proposed information bottleneck can decompose the content and style with only a small loss of content information. Also, for adversarial feedback of each information, the discriminator is decomposed into content and style discriminator with self-supervision, which enable our model to achieve better generalization to the voice style of the converted speech. The experimental results show the superiority of our model in disentanglement and transfer performance, and improve audio quality by preserving content information.

    Original languageEnglish
    Title of host publicationAdvances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
    EditorsMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
    PublisherNeural information processing systems foundation
    Pages294-308
    Number of pages15
    ISBN (Electronic)9781713845393
    Publication statusPublished - 2021
    Event35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
    Duration: 2021 Dec 62021 Dec 14

    Publication series

    NameAdvances in Neural Information Processing Systems
    Volume1
    ISSN (Print)1049-5258

    Conference

    Conference35th Conference on Neural Information Processing Systems, NeurIPS 2021
    CityVirtual, Online
    Period21/12/621/12/14

    Bibliographical note

    Funding Information:
    This work was partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program (Korea University) and No. 2021-0-02068, Artificial Intelligence Innovation Hub) and Microsoft Research Asia (MSRA).

    Publisher Copyright:
    © 2021 Neural information processing systems foundation. All rights reserved.

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems
    • Signal Processing

    Fingerprint

    Dive into the research topics of 'VoiceMixer: Adversarial Voice Style Mixup'. Together they form a unique fingerprint.

    Cite this