Action-aware Masking Network with Group-based Attention for Temporal Action Localization

Tae Kyung Kang, Gun Hee Lee, Kyung Min Jin, Seong Whan Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Temporal Action Localization (TAL) is a significant and challenging task that searches for subtle human activities in an untrimmed video. To extract snippet-level video features, existing TAL methods commonly use video encoders pre-trained on short-video classification datasets. However, the snippet-level features can incur ambiguity between consecutive frames due to short and poor temporal information, disrupting the precise prediction of action instances. Several methods incorporating temporal relations have been proposed to mitigate this problem; however, they still suffer from poor video features. To address this issue, we propose a novel temporal action localization framework called an Action-aware Masking Network (AMNet). Our method simultaneously refines video features using action-aware attention and considers inherent temporal relations using self-attention and cross-attention mechanisms. First, we present an Action Masking Encoder (AME) that generates an action-aware mask to represent positive characteristics, which is then used to refine snippet-level features to be more salient around actions. Second, we design a Group Attention Module (GAM), which models relations of temporal information and exchanges mutual information by dividing the features into two groups, i.e., long and short-groups. Extensive experiments and ablation studies on two primary benchmark datasets demonstrate the effectiveness of AM-Net, and our method achieves state-of-the-art performances on THUMOS-14 and ActivityNet1.3.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6047-6056
Number of pages10
ISBN (Electronic)9781665493468
DOIs
Publication statusPublished - 2023
Event23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023 - Waikoloa, United States
Duration: 2023 Jan 32023 Jan 7

Publication series

NameProceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023

Conference

Conference23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023
Country/TerritoryUnited States
CityWaikoloa
Period23/1/323/1/7

Bibliographical note

Funding Information:
Acknowledgement This work was partially supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program(Korea University), No. 2022-0-00984, Development of Artificial Intelligence Technology for Personalized Plug-and-Play Explanation and Verification of Explanation).

Publisher Copyright:
© 2023 IEEE.

Keywords

  • Algorithms: Video recognition and understanding (tracking, action recognition, etc.)
  • and algorithms (including transfer, low-shot, semi-, self-, and un-supervised learning)
  • formulations
  • Machine learning architectures

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Action-aware Masking Network with Group-based Attention for Temporal Action Localization'. Together they form a unique fingerprint.

Cite this