TY - JOUR
T1 - Overlapped Frequency-Distributed Network
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
AU - Choi, Sunmook
AU - Kwak, Il Youp
AU - Oh, Seungsang
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. NRF-2020R1C1C1A01013020) and Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2019-0-00033, 50%, Study on Quantum Security Evaluation of Cryptography based on Computational Quantum Complexity).
Publisher Copyright:
Copyright © 2022 ISCA.
PY - 2022
Y1 - 2022
N2 - Numerous IT companies around the world are developing and deploying artificial voice assistants via their products, but they are still vulnerable to spoofing attacks. Since 2015, the competition “Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof)” has been held every two years to encourage people to design systems that can detect spoofing attacks. In this paper, we focused on developing spoofing countermeasure systems mainly based on Convolutional Neural Networks (CNNs). However, CNNs have translation invariant property, which may cause loss of frequency information when a spectrogram is used as input. Hence, we propose models which split inputs along the frequency axis: 1) Overlapped Frequency-Distributed (OFD) model and 2) Non-overlapped Frequency-Distributed (Non-OFD) model. Using ASVspoof 2019 dataset, we measured their performances with two different activations; ReLU and Max feature map (MFM). The best performing model on LA dataset is the Non-OFD model with ReLU which achieved an equal error rate (EER) of 1.35%, and the best performing model on PA dataset is the OFD model with MFM which achieved an EER of 0.35%.
AB - Numerous IT companies around the world are developing and deploying artificial voice assistants via their products, but they are still vulnerable to spoofing attacks. Since 2015, the competition “Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof)” has been held every two years to encourage people to design systems that can detect spoofing attacks. In this paper, we focused on developing spoofing countermeasure systems mainly based on Convolutional Neural Networks (CNNs). However, CNNs have translation invariant property, which may cause loss of frequency information when a spectrogram is used as input. Hence, we propose models which split inputs along the frequency axis: 1) Overlapped Frequency-Distributed (OFD) model and 2) Non-overlapped Frequency-Distributed (Non-OFD) model. Using ASVspoof 2019 dataset, we measured their performances with two different activations; ReLU and Max feature map (MFM). The best performing model on LA dataset is the Non-OFD model with ReLU which achieved an equal error rate (EER) of 1.35%, and the best performing model on PA dataset is the OFD model with MFM which achieved an EER of 0.35%.
KW - Deep learning
KW - audio deep synthesis
KW - countermeasure
KW - fake audio detection
KW - spoofing
UR - http://www.scopus.com/inward/record.url?scp=85140044841&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-657
DO - 10.21437/Interspeech.2022-657
M3 - Conference article
AN - SCOPUS:85140044841
SN - 2308-457X
VL - 2022-September
SP - 3558
EP - 3562
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 18 September 2022 through 22 September 2022
ER -