Abstract
Defense against adversarial attacks is critical for the reliability and safety of deep neural networks (DNNs). Current state-of-the-art defense methods achieve significant robustness against adversarial attacks. However, such defense methods cannot distinguish between adversarial examples (AEs) and normal examples (NEs). Thus, they apply the same defense process for both examples to perform classification, resulting in performance degradation for NEs. In this paper, we propose a novel defense method based on the student-teacher framework that can minimize the classification performance degradation for NEs by detecting AEs and then applying the defense process only to AEs. Focusing on the fact that distortion in the hidden layer features is inevitable for the success of adversarial attacks, we train the student network to predict the undistorted hidden layer features of the teacher network (target DNN). Therefore, our method can detect AEs through the difference in the hidden layer features between the student and teacher network, and then recover the classification result of AEs using the penultimate layer features predicted by the student network. Through extensive experiments on representative image classification benchmark datasets, i.e., CIFAR-10, CIFAR-100, and TinyImagenet, we demonstrate the superiority of our method in both detection and defense compared with state-of-the-art methods. Furthermore, we show that our method achieves robust detection and defense performance for a fully white-box attack that assumes an attacker knows the information of our entire detection and defense mechanism.
Original language | English |
---|---|
Pages (from-to) | 82742-82752 |
Number of pages | 11 |
Journal | IEEE Access |
Volume | 12 |
DOIs | |
Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- Adversarial attack
- adversarial defense
- adversarial detection
- student-teacher network
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering