C2I-CAT: Class-to-Image Cross Attention Transformer for Out-of-Distribution Detection

Jaeho Chung, Seokho Cho, Hyunjun Choi, Daeung Jo, Yoonho Jung, Jin Young Choi

Research output: Contribution to journalArticlepeer-review

Abstract

In our work, we have empirically found that Vision Transformer (ViT) could not extract object-centric features when applied to out-of-distribution (OOD) detection. To make object-centric attention, we design an additional module that employs a cross-attention between class-wise token proxy and feature token sequence of an input image. For inference suitable to our cross-attention structure with multiple class-wise token proxies, we propose a score ensemble that can be applied to any scoring function. Compared to ViT, the proposed inference scheme achieves outperforming performance by synergizing with our cross-attention structure. Through experiments, we demonstrate that the proposed cross-attention structure with score ensemble inference improves largely near OOD detection performance, where FPR95 improvement in near OOD detection compared to the state-of-the-art method becomes 2.55% for CIFAR-10 and 2.67% for CIFAR-100, keeping competitive classification accuracy.

Original languageEnglish
Pages (from-to)62793-62803
Number of pages11
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • Near out-of-distribution (OOD) detection
  • class-wise cross attention
  • vision transformer

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'C2I-CAT: Class-to-Image Cross Attention Transformer for Out-of-Distribution Detection'. Together they form a unique fingerprint.

Cite this