Abstract
Voice conversion converts the voice while maintaining the language information. It uses two samples to synthesize speech: the source sample is used for content, the target sample is used for style representation. Therefore, VC has been progressed to design information flow to disentangle content and style in a speech. However, separated representations are damaged while passing sparse subspace. Besides, VC models suffer from the training-inference mismatch problem: they only use one sample in training. Accordingly, the model extracts inappropriate content and style representation and generates low-quality speech during inference. To address the mismatch scenario problem, we propose a StyleVC, which utilizes adversarial style generalization. First, we propose style generalization, which captures global style representation and restricts the model from copying information. Second, we use a pitch predictor to estimate pitch information according to content and style representation. Third, we further use adversarial training to make the model generate more realistic speech. Finally, we demonstrate our proposed model can generate high-quality speech. The experimental results also show that the proposed StyleVC significantly outperforms to extract the desired features and improve audio quality during inference.
Original language | English |
---|---|
Title of host publication | 2022 26th International Conference on Pattern Recognition, ICPR 2022 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 23-30 |
Number of pages | 8 |
ISBN (Electronic) | 9781665490627 |
DOIs | |
Publication status | Published - 2022 |
Event | 26th International Conference on Pattern Recognition, ICPR 2022 - Montreal, Canada Duration: 2022 Aug 21 → 2022 Aug 25 |
Publication series
Name | Proceedings - International Conference on Pattern Recognition |
---|---|
Volume | 2022-August |
ISSN (Print) | 1051-4651 |
Conference
Conference | 26th International Conference on Pattern Recognition, ICPR 2022 |
---|---|
Country/Territory | Canada |
City | Montreal |
Period | 22/8/21 → 22/8/25 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
ASJC Scopus subject areas
- Computer Vision and Pattern Recognition