Abstract
Recently, Generative Adversarial Networks (GAN) and Variational AutoEncoders (VAE) have been applied to voice conversion that can make use of non-parallel training data. Especially, Conditional Cycle-Consistent Generative Adversarial Networks (CC-GAN) and Cycle-Consistent Variational AutoEncoders (CycleVAE) show promising results in many-to-many voice conversion among multiple speakers. However, the number of speakers has been relatively small in the conventional voice conversion studies using the CC-GANs and the CycleVAEs. In this paper, we extend the number of speakers to 100, and analyze the performances of the many-to-many voice conversion methods experimentally. It has been found through the experiments that the CC-GAN shows 4.5 % less Mel-Cepstral Distortion (MCD) for a small number of speakers, whereas the CycleVAE shows 12.7 % less MCD in a limited training time for a large number of speakers.
Original language | English |
---|---|
Pages (from-to) | 351-358 |
Number of pages | 8 |
Journal | Journal of the Acoustical Society of Korea |
Volume | 41 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2022 |
Bibliographical note
Publisher Copyright:Copyright © 2022 The Acoustical Society of Korea.
Keywords
- Conditional Cycle-Consistent Generative Adversarial Network (CC-GAN)
- Cycle-Consistent Variational AutoEncoder (CycleVAE)
- Generative Adversarial Network (GAN)
- Variational AutoEncoder (VAE)
- Voice conversion
ASJC Scopus subject areas
- Signal Processing
- Instrumentation
- Acoustics and Ultrasonics
- Applied Mathematics
- Speech and Hearing