TY - GEN
T1 - Low-quality Fake Audio Detection through Frequency Feature Masking
AU - Kwak, Il Youp
AU - Choi, Sunmook
AU - Yang, Jonghoon
AU - Lee, Yerin
AU - Han, Soyul
AU - Oh, Seungsang
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. NRF-2022R1F1A1064273 and NRF-2020R1C1C1A01013020).
Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/10/14
Y1 - 2022/10/14
N2 - The first Audio Deep Synthesis Detection Challenge (ADD 2022) competition was held which dealt with audio deepfake detection, audio deep synthesis, audio fake game, and adversarial attacks. Our team participated in track 1, classifying bona fide and fake utterances in noisy environments. Through exploratory data analysis,we found that noisy signals appear in similar frequency bands for given voice samples. If a model is trained to rely heavily on information in frequency bands where noise exists, performance will be poor. In this paper, we propose a data augmentation method, Frequency Feature Masking (FFM) that randomly masks frequency bands. FFM makes a model robust by not relying on specific frequency bands and prevents overfitting. We applied FFM and mixup augmentation on five spectrogram-based deep neural network architectures that performed well for spoofing detection using mel-spectrogram and constant Q transform (CQT) features. Our best submission achieved 23.8% in EER and ranked 3rd on track 1. To demonstrate the usefulness of our proposed FFM augmentation, we further experimented with FFM augmentation using ASVspoof 2019 Logical Access (LA) datasets.
AB - The first Audio Deep Synthesis Detection Challenge (ADD 2022) competition was held which dealt with audio deepfake detection, audio deep synthesis, audio fake game, and adversarial attacks. Our team participated in track 1, classifying bona fide and fake utterances in noisy environments. Through exploratory data analysis,we found that noisy signals appear in similar frequency bands for given voice samples. If a model is trained to rely heavily on information in frequency bands where noise exists, performance will be poor. In this paper, we propose a data augmentation method, Frequency Feature Masking (FFM) that randomly masks frequency bands. FFM makes a model robust by not relying on specific frequency bands and prevents overfitting. We applied FFM and mixup augmentation on five spectrogram-based deep neural network architectures that performed well for spoofing detection using mel-spectrogram and constant Q transform (CQT) features. Our best submission achieved 23.8% in EER and ranked 3rd on track 1. To demonstrate the usefulness of our proposed FFM augmentation, we further experimented with FFM augmentation using ASVspoof 2019 Logical Access (LA) datasets.
KW - Audio deep synthesis
KW - audio data augmentation
KW - deep learning
KW - frequency feature masking
KW - low-quality audio
UR - http://www.scopus.com/inward/record.url?scp=85141649824&partnerID=8YFLogxK
U2 - 10.1145/3552466.3556533
DO - 10.1145/3552466.3556533
M3 - Conference contribution
AN - SCOPUS:85141649824
T3 - DDAM 2022 - Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia
SP - 9
EP - 17
BT - DDAM 2022 - Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia
PB - Association for Computing Machinery, Inc
T2 - 1st International Workshop on Deepfake Detection for Audio Multimedia, DDAM 2022
Y2 - 14 October 2022
ER -