Abstract
This paper presents LIMITLESS, a cross-lingual text-to-speech via hierarchical style transfer that can transfer the prosody and voice style, respectively. Building upon HierSpeech++, we utilize the 2-stage hierarchical speech synthesis frameworks with text-to-vector (TTV) and vector-to-speech. We simply modify the TTV by adding the language embedding of each language on the text representation and use the hierarchical speech synthesizer without modification. We train the TTV model with 7 languages and 14 speakers from the Indic languages dataset which was released for LIMMITS 2024 and fine-tuned the TTV model with target speakers for Track 1 and 2. The results show that our framework can transfer voice style robustly in terms of speaker similarity.
Original language | English |
---|---|
Title of host publication | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 25-26 |
Number of pages | 2 |
ISBN (Electronic) | 9798350374513 |
DOIs | |
Publication status | Published - 2024 |
Event | 49th IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Seoul, Korea, Republic of Duration: 2024 Apr 14 → 2024 Apr 19 |
Publication series
Name | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings |
---|
Conference
Conference | 49th IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 |
---|---|
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 24/4/14 → 24/4/19 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- Cross-lingual TTS
- Multi-lingual TTS
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Networks and Communications
- Signal Processing
- Media Technology
- Acoustics and Ultrasonics