Temporal-Invariant Video Representation Learning with Dynamic Temporal Resolutions

Seong Yun Jeong, Ho Joong Kim, Myeong Seok Oh, Gun Hee Lee, Seong Whan Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent studies for similarity-based self-supervised representation learning tend to consider only fixed temporal coverage from a given video. However, this approach limits that a model learns temporally persistent representations since it cannot reflect spatial and temporal information gaps from resolution variations. To overcome the limitation, this paper proposes a Temporal Adaptive Teacher-Student (TATS) framework that encourages the trained model to be robust on spatio-temporal variations. Our key approach is optimizing similarity-based learning that utilizes several views with dynamic temporal resolutions. From a given video, TATS captures spatio-temporal invariant clues for temporally persistent representation with cross-resolution correspondence between local and global views. Extensive experiments show that our TATS achieves competitive downstream (action recognition and video retrieval) performances on benchmarks (UCF101 and HMDB51).

Original languageEnglish
Title of host publicationAVSS 2022 - 18th IEEE International Conference on Advanced Video and Signal-Based Surveillance
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665463829
DOIs
Publication statusPublished - 2022
Event18th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2022 - Virtual, Online, Spain
Duration: 2022 Nov 292022 Dec 2

Publication series

NameAVSS 2022 - 18th IEEE International Conference on Advanced Video and Signal-Based Surveillance

Conference

Conference18th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2022
Country/TerritorySpain
CityVirtual, Online
Period22/11/2922/12/2

Bibliographical note

Funding Information:
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program(Korea University), No.B0101-15-0266, Development of High Performance Visual BigData Discovery Platform for Large-Scale Realtime Data Analysis).

Publisher Copyright:
© 2022 IEEE.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Information Systems and Management
  • Media Technology

Fingerprint

Dive into the research topics of 'Temporal-Invariant Video Representation Learning with Dynamic Temporal Resolutions'. Together they form a unique fingerprint.

Cite this