TY - GEN
T1 - Voice presentation attack detection through text-converted voice command analysis
AU - Kwak, Il Youp
AU - Huh, Jun Ho
AU - Han, Seung Taek
AU - Kim, Iljoo
AU - Yoon, Jiwon
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/5/2
Y1 - 2019/5/2
N2 - Voice assistants are quickly being upgraded to support advanced, security-critical commands such as unlocking devices, checking emails, and making payments. In this paper, we explore the feasibility of using users’ text-converted voice command utterances as classification features to help identify users’ genuine commands, and detect suspicious commands. To maintain high detection accuracy, our approach starts with a globally trained attack detection model (immediately available for new users), and gradually switches to a user-specific model tailored to the utterance patterns of a target user. To evaluate accuracy, we used a real-world voice assistant dataset consisting of about 34.6 million voice commands collected from 2.6 million users. Our evaluation results show that this approach is capable of achieving about 3.4% equal error rate (EER), detecting 95.7% of attacks when an optimal threshold value is used. As for those who frequently use security-critical (attack-like) commands, we still achieve EER below 5%.
AB - Voice assistants are quickly being upgraded to support advanced, security-critical commands such as unlocking devices, checking emails, and making payments. In this paper, we explore the feasibility of using users’ text-converted voice command utterances as classification features to help identify users’ genuine commands, and detect suspicious commands. To maintain high detection accuracy, our approach starts with a globally trained attack detection model (immediately available for new users), and gradually switches to a user-specific model tailored to the utterance patterns of a target user. To evaluate accuracy, we used a real-world voice assistant dataset consisting of about 34.6 million voice commands collected from 2.6 million users. Our evaluation results show that this approach is capable of achieving about 3.4% equal error rate (EER), detecting 95.7% of attacks when an optimal threshold value is used. As for those who frequently use security-critical (attack-like) commands, we still achieve EER below 5%.
KW - Attack detection
KW - Voice assistant security
KW - Voice command analysis
UR - http://www.scopus.com/inward/record.url?scp=85067622621&partnerID=8YFLogxK
U2 - 10.1145/3290605.3300828
DO - 10.1145/3290605.3300828
M3 - Conference contribution
AN - SCOPUS:85067622621
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI 2019 - Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
PB - Association for Computing Machinery
T2 - 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019
Y2 - 4 May 2019 through 9 May 2019
ER -