Many-to-many voice conversion experiments using a Korean speech corpus

Dongsuk Yook, Hyung Jin Seo, Bonggu Ko, In Chul Yoo

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, Generative Adversarial Networks (GAN) and Variational AutoEncoders (VAE) have been applied to voice conversion that can make use of non-parallel training data. Especially, Conditional Cycle-Consistent Generative Adversarial Networks (CC-GAN) and Cycle-Consistent Variational AutoEncoders (CycleVAE) show promising results in many-to-many voice conversion among multiple speakers. However, the number of speakers has been relatively small in the conventional voice conversion studies using the CC-GANs and the CycleVAEs. In this paper, we extend the number of speakers to 100, and analyze the performances of the many-to-many voice conversion methods experimentally. It has been found through the experiments that the CC-GAN shows 4.5 % less Mel-Cepstral Distortion (MCD) for a small number of speakers, whereas the CycleVAE shows 12.7 % less MCD in a limited training time for a large number of speakers.

Original languageEnglish
Pages (from-to)351-358
Number of pages8
JournalJournal of the Acoustical Society of Korea
Volume41
Issue number3
DOIs
Publication statusPublished - 2022

Bibliographical note

Publisher Copyright:
Copyright © 2022 The Acoustical Society of Korea.

Keywords

  • Conditional Cycle-Consistent Generative Adversarial Network (CC-GAN)
  • Cycle-Consistent Variational AutoEncoder (CycleVAE)
  • Generative Adversarial Network (GAN)
  • Variational AutoEncoder (VAE)
  • Voice conversion

ASJC Scopus subject areas

  • Signal Processing
  • Instrumentation
  • Acoustics and Ultrasonics
  • Applied Mathematics
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'Many-to-many voice conversion experiments using a Korean speech corpus'. Together they form a unique fingerprint.

Cite this