Zero-Shot Unseen Speaker Anonymization via Voice Conversion

Hyung Pil Chang, In Chul Yoo, Changhyeon Jeong, Dongsuk Yook

Research output: Contribution to journalArticlepeer-review

Abstract

Speech-based interfaces provide convenient methods for controlling various smart devices. For these interfaces to work reliably, considerable speech data with various noise and speaker characteristics must be collected to train the associated speech-processing models. Gathering spoken commands from actual users of devices can improve those devices' performance by familiarizing each device with the individual acoustic characteristic of its particular user's speech. However, the direct acquisition of spoken commands could threaten the privacy of users, as the spoken data would contain sensitive speaker-specific information. Speaker anonymization algorithms can be applied to suppress such sensitive information, while preserving the linguistic content of a user's speech. Previous speaker anonymization algorithms could handle only the voice of speakers who contributed to the training datasets. As speaker anonymization algorithms are typically applied to new speakers (who are absent from the training datasets), a method of handling such speakers (commonly referred to as 'unseen speakers') should be developed. In this paper, we propose a novel method that can effectively suppress the individual characteristics in an unseen speaker's voice, while retaining the linguistic content of the speech. It adopts zero-shot voice conversion methods for the unseen speaker anonymization. Since the proposed method utilizes speaker identity vectors commonly used in many-to-many voice conversion algorithms and does not modify the conversion algorithm itself, it can be easily combined with many other voice conversion algorithms. The proposed method is evaluated using the VCC2018 and VCTK corpora. Speaker identification rate and speech recognition rate are used for quantitative analysis. The experimental results showed that the average speaker identification accuracy was decreased by 92.3% point absolutely and the average speech recognition accuracy was decreased by 17.7% point absolutely after the speaker anonymization by the proposed method.

Original languageEnglish
Pages (from-to)130190-130199
Number of pages10
JournalIEEE Access
Volume10
DOIs
Publication statusPublished - 2022

Bibliographical note

Funding Information:
This work was supported in part by the Basic Science Research Program through the National Research Foundation (NRF) of Korea through the Ministry of Science, ICT and Future Planning under Grant NRF-2017R1E1A1A01078157, and in part by the NRF under Project BK21 FOUR.

Publisher Copyright:
© 2013 IEEE.

Keywords

  • Data privacy
  • speaker anonymization
  • unseen speakers
  • variational autoencoder
  • voice conversion
  • zero-shot learning

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Zero-Shot Unseen Speaker Anonymization via Voice Conversion'. Together they form a unique fingerprint.

Cite this