Abstract
This paper presents LIMITLESS, a cross-lingual text-to-speech via hierarchical style transfer that can transfer the prosody and voice style, respectively. Building upon HierSpeech++, we utilize the 2-stage hierarchical speech synthesis frameworks with text-to-vector (TTV) and vector-to-speech. We simply modify the TTV by adding the language embedding of each language on the text representation and use the hierarchical speech synthesizer without modification. We train the TTV model with 7 languages and 14 speakers from the Indic languages dataset which was released for LIMMITS 2024 and fine-tuned the TTV model with target speakers for Track 1 and 2. The results show that our framework can transfer voice style robustly in terms of speaker similarity.
| Original language | English |
|---|---|
| Title of host publication | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 25-26 |
| Number of pages | 2 |
| ISBN (Electronic) | 9798350374513 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Seoul, Korea, Republic of Duration: 2024 Apr 14 → 2024 Apr 19 |
Publication series
| Name | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings |
|---|
Conference
| Conference | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Seoul |
| Period | 24/4/14 → 24/4/19 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- Cross-lingual TTS
- Multi-lingual TTS
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Networks and Communications
- Signal Processing
- Media Technology
- Acoustics and Ultrasonics