VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion

Research output: Contribution to journalConference articlepeer-review

Abstract

Controlling singing style is crucial for achieving an expressive and natural singing voice. Among the various style factors, vibrato plays a key role in conveying emotions and enhancing musical depth. However, modeling vibrato remains challenging due to its dynamic nature, making it difficult to control in singing voice conversion. To address this, we propose VibE-SVC, a controllable singing voice conversion model that explicitly extracts and manipulates vibrato using discrete wavelet transform. Unlike previous methods that model vibrato implicitly, our approach decomposes the F0 contour into frequency components, enabling precise transfer. This allows vibrato control for enhanced flexibility. Experimental results show that VibE-SVC effectively transforms singing styles while preserving speaker similarity. Both subjective and objective evaluations confirm high-quality conversion.

Original languageEnglish
Pages (from-to)1233-1237
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
Publication statusPublished - 2025
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 2025 Aug 172025 Aug 21

Bibliographical note

Publisher Copyright:
© 2025 International Speech Communication Association. All rights reserved.

Keywords

  • discrete wavelet transform
  • singing style transfer
  • singing voice conversion
  • vibrato control

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Language and Linguistics
  • Modelling and Simulation
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'VibE-SVC: Vibrato Extraction with High-frequency F0 Contour for Singing Voice Conversion'. Together they form a unique fingerprint.

Cite this