Description Attribute-Enhanced Spatio-Temporal Zero-Shot Action Recognition

  • Yehna Kim
  • , Ho Joong Kim
  • , Seong Whan Lee*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Vision-Language Models (VLMs) have shown remarkable performance in zero-shot action recognition by learning the correlation between video embeddings and class embeddings. However, an issue arises when depending solely on action classes for semantic information due to multi-semantic words - words with multiple meanings. Theses words hinder the difficulty of the model to accurately capture the intended concepts of actions. We propose a novel approach which leverages web-crawled descriptions with utilizing a large-language model for the extraction of keywords. This method reduces the reliance on human annotators and avoids the exhaustive manual process of attribute data creation. Moreover, we introduce a spatio-temporal interaction module which focuses on objects and action units to align description attributes with video content. In zero-shot experiment, our model achieves 81.0%, 53.1%, and 68.9% on UCF-101, HMDB-51, and Kinetics-600, respectively, which demonstrates the transferability of our model to downstream tasks.

Original languageEnglish
Title of host publicationPattern Recognition and Artificial Intelligence - 4th International Conference, ICPRAI 2024, Proceedings
EditorsChristian Wallraven, Cheng-Lin Liu, Arun Ross
PublisherSpringer Science and Business Media Deutschland GmbH
Pages296-309
Number of pages14
ISBN (Print)9789819787043
DOIs
Publication statusPublished - 2025
Event4th International Conference on Pattern Recognition and Artificial Intelligence, ICPRAI 2024 - Jeju Island, Korea, Republic of
Duration: 2024 Jul 32024 Jul 6

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14893 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th International Conference on Pattern Recognition and Artificial Intelligence, ICPRAI 2024
Country/TerritoryKorea, Republic of
CityJeju Island
Period24/7/324/7/6

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

Keywords

  • Action recognition
  • Vision-language model
  • Zero-shot transfer

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Description Attribute-Enhanced Spatio-Temporal Zero-Shot Action Recognition'. Together they form a unique fingerprint.

Cite this