TY - GEN
T1 - Hierarchical spatial object detection for ATM vandalism surveillance
AU - Lee, Jun Yeop
AU - Cho, Chul Jin
AU - Han, David K.
AU - Ko, Hanseok
N1 - Funding Information:
Authors of Korea University were supported by the National Research Foundation (NRF) grant funded by the MSIP of Korea (No. 2017R1A2B4012720). David Han’s contribution was supported by the US Army Research Laboratory.
Publisher Copyright:
© 2018 IEEE.
PY - 2019/2/11
Y1 - 2019/2/11
N2 - In this paper, a multi-modal classification is proposed for recognizing vandalism against Automatic Teller Machines (ATMs). The visual and textual information base model is developed here to identify external threats on ATMs. The model discriminates threatening behaviors from those that are benign in the image. It provides a level of confidence in the threat recognition by visual object classification coupled with word vector distance measure. To achieve our goal, real-time object detection based on a Region Convolutional Neural Network (R-CNN) first detects objects in the scene and word embedding technique allows to measure distance between the detected object label with predefined tools assumed to be used for vandalizing ATMs. Similarity measure from word embedding not only determines whether the scene may lead to any nefarious activities, but also would provide the level of confidence in occurrence of such incidents. From the experimental evaluation, it is shown that the method is effective and delivers a quantitative measure on decisions it makes.
AB - In this paper, a multi-modal classification is proposed for recognizing vandalism against Automatic Teller Machines (ATMs). The visual and textual information base model is developed here to identify external threats on ATMs. The model discriminates threatening behaviors from those that are benign in the image. It provides a level of confidence in the threat recognition by visual object classification coupled with word vector distance measure. To achieve our goal, real-time object detection based on a Region Convolutional Neural Network (R-CNN) first detects objects in the scene and word embedding technique allows to measure distance between the detected object label with predefined tools assumed to be used for vandalizing ATMs. Similarity measure from word embedding not only determines whether the scene may lead to any nefarious activities, but also would provide the level of confidence in occurrence of such incidents. From the experimental evaluation, it is shown that the method is effective and delivers a quantitative measure on decisions it makes.
UR - http://www.scopus.com/inward/record.url?scp=85063269552&partnerID=8YFLogxK
U2 - 10.1109/AVSS.2018.8639154
DO - 10.1109/AVSS.2018.8639154
M3 - Conference contribution
AN - SCOPUS:85063269552
T3 - Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance
BT - Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2018
Y2 - 27 November 2018 through 30 November 2018
ER -