Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech

  • Taesoo Kim
  • , Jinju Kim
  • , Dongchan Kim
  • , Jong Hwan Ko*
  • , Gyeong Moon Park*
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

The rapid advancement of Zero-Shot Text-toSpeech (ZS-TTS) technology has enabled highfidelity voice synthesis from minimal audio cues, raising significant privacy and ethical concerns. Despite the threats to voice privacy, research to selectively remove the knowledge to replicate unwanted individual voices from pre-trained model parameters has not been explored. In this paper, we address the new challenge of speaker identity unlearning for ZS-TTS systems. To meet this goal, we propose the first machine unlearning frameworks for ZS-TTS, especially TeacherGuided Unlearning (TGU), designed to ensure the model forgets designated speaker identities while retaining its ability to generate accurate speech for other speakers. Our proposed methods incorporate randomness to prevent consistent replication of forget speakers’ voices, assuring unlearned identities remain untraceable. Additionally, we propose a new evaluation metric, speaker-Zero Retrain Forgetting (spk-ZRF). This assesses the model’s ability to disregard prompts associated with forgotten speakers, effectively neutralizing its knowledge of these voices. The experiments conducted on the state-of-the-art model demonstrate that TGU prevents the model from replicating forget speakers’ voices while maintaining high quality for other speakers.

Original languageEnglish
Pages (from-to)30176-30198
Number of pages23
JournalProceedings of Machine Learning Research
Volume267
Publication statusPublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 2025 Jul 132025 Jul 19

Bibliographical note

Publisher Copyright:
© 2025, ML Research Press. All rights reserved.

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech'. Together they form a unique fingerprint.

Cite this