Multi-Contextual Predictions with Vision Transformer for Video Anomaly Detection

Jooyeon Lee, Woo Jeoung Nam, Seong Whan Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)


Video Anomaly Detection(VAD) has been traditionally tackled in two main methodologies: the reconstruction-based approach and the prediction-based one. As the reconstruction-based methods learn to generalize the input image, the model merely learns an identity function and strongly causes the problem called generalizing issue. On the other hand, since the prediction-based ones learn to predict a future frame given several previous frames, they are less sensitive to the generalizing issue. However, it is still uncertain if the model can learn the spatio-temporal context of a video. Our intuition is that the understanding of the spatio-temporal context of a video plays a vital role in VAD as it provides precise information on how the appearance of an event in a video clip changes. Hence, to fully exploit the context information for anomaly detection in video circumstances, we designed the transformer model with three different contextual prediction streams: masked, whole and partial. By learning to predict the missing frames of consecutive normal frames, our model can effectively learn various normality patterns in the video, which leads to a high reconstruction error at the abnormal cases that are unsuitable to the learned context. To verify the effectiveness of our approach, we assess our model on the public benchmark datasets: USCD Pedestrian 2, CUHK Avenue and ShanghaiTech and evaluate the performance with the anomaly score metric of reconstruction error. The results demonstrate that our proposed approach achieves a competitive performance compared to the existing video anomaly detection methods.

Original languageEnglish
Title of host publication2022 26th International Conference on Pattern Recognition, ICPR 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages7
ISBN (Electronic)9781665490627
Publication statusPublished - 2022
Event26th International Conference on Pattern Recognition, ICPR 2022 - Montreal, Canada
Duration: 2022 Aug 212022 Aug 25

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651


Conference26th International Conference on Pattern Recognition, ICPR 2022

Bibliographical note

Funding Information:
We thank Sueyeon Kim for proofreading the manuscript and related discussions. This work was conducted by Center for Applied Research in Artificial Intelligence(CARAI) grant funded by Defense Acquisition Program Administration(DAPA) and Agency for Defense Development(ADD) (UD190031RD).

Publisher Copyright:
© 2022 IEEE.

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Multi-Contextual Predictions with Vision Transformer for Video Anomaly Detection'. Together they form a unique fingerprint.

Cite this