Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics

Jaehyuk Heo, Seungwan Seo, Pilsung Kang

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)


Deep-learning models have demonstrated remarkable performance in a variety of fields, owing to advancements in computational power and the availability of extensive datasets for training large-scale models. Nonetheless, these models inherently possess a vulnerability wherein even small alterations to the input can lead to substantially different outputs. Consequently, it is imperative to assess the robustness of deep-learning models prior to relying on their decision-making capabilities. In this study, we investigate the adversarial robustness of convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid CNNs +ViTs, which represent prevalent architectures in computer vision. Our evaluation is grounded on four novel model-sensitivity metrics that we introduce. These metrics are evaluated in the context of random noise and gradient-based adversarial perturbations. To ensure a fair comparison, we employ models with comparable capacities within each group and conduct experiments separately, utilizing ImageNet-1K and ImageNet-21K as pretraining data. Our fair experimental results provide empirical evidence that ViT-based models exhibit higher adversarial robustness than CNN-based counterparts, helping to dispel doubts about the findings of prior studies. Additionally, we introduce novel metrics that contribute new insights into the previously unconfirmed characteristics of these models.

Original languageEnglish
Article number103800
JournalComputer Vision and Image Understanding
Publication statusPublished - 2023 Oct

Bibliographical note

Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) ( NRF-2022R1A2C2005455 ). This work was also supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00471 , Development of Autonomous Control Technology for Error-Free Information Infrastructure Based on Modeling & Optimization).

Publisher Copyright:
© 2023


  • Adversarial robustness
  • Computer vision

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics'. Together they form a unique fingerprint.

Cite this