TY - GEN
T1 - Interpretable human action recognition in compressed domain
AU - Srinivasan, Vignesh
AU - Lapuschkin, Sebastian
AU - Hellge, Cornelius
AU - Muller, Klaus Robert
AU - Samek, Wojciech
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/16
Y1 - 2017/6/16
N2 - Compressed domain human action recognition algorithms are extremely efficient, because they only require a partial decoding of the video bit stream. However, the question what exactly makes these algorithms decide for a particular action is still a mystery. In this paper, we present a general method, Layer-wise Relevance Propagation (LRP), to understand and interpret action recognition algorithms and apply it to a state-of-the-art compressed domain method based on Fisher vector encoding and SVM classification. By using LRP, the classifiers decisions are propagated back every step in the action recognition pipeline until the input is reached. This methodology allows to identify where and when the important (from the classifier's perspective) action happens in the video. To our knowledge, this is the first work to interpret a compressed domain action recognition algorithm. We evaluate our method on the HMDB51 dataset and show that in many cases a few significant frames contribute most towards the prediction of the video to a particular class.
AB - Compressed domain human action recognition algorithms are extremely efficient, because they only require a partial decoding of the video bit stream. However, the question what exactly makes these algorithms decide for a particular action is still a mystery. In this paper, we present a general method, Layer-wise Relevance Propagation (LRP), to understand and interpret action recognition algorithms and apply it to a state-of-the-art compressed domain method based on Fisher vector encoding and SVM classification. By using LRP, the classifiers decisions are propagated back every step in the action recognition pipeline until the input is reached. This methodology allows to identify where and when the important (from the classifier's perspective) action happens in the video. To our knowledge, this is the first work to interpret a compressed domain action recognition algorithm. We evaluate our method on the HMDB51 dataset and show that in many cases a few significant frames contribute most towards the prediction of the video to a particular class.
KW - Action recognition
KW - compressed domain
KW - fisher vector encoding
KW - interpretable classification
KW - motion vectors
UR - http://www.scopus.com/inward/record.url?scp=85023764863&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2017.7952445
DO - 10.1109/ICASSP.2017.7952445
M3 - Conference contribution
AN - SCOPUS:85023764863
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1692
EP - 1696
BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
Y2 - 5 March 2017 through 9 March 2017
ER -