Temporal-Invariant Video Representation Learning with Dynamic Temporal Resolutions

Seong Yun Jeong, Ho Joong Kim, Myeong Seok Oh, Gun Hee Lee, Seong Whan Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Recent studies for similarity-based self-supervised representation learning tend to consider only fixed temporal coverage from a given video. However, this approach limits that a model learns temporally persistent representations since it cannot reflect spatial and temporal information gaps from resolution variations. To overcome the limitation, this paper proposes a Temporal Adaptive Teacher-Student (TATS) framework that encourages the trained model to be robust on spatio-temporal variations. Our key approach is optimizing similarity-based learning that utilizes several views with dynamic temporal resolutions. From a given video, TATS captures spatio-temporal invariant clues for temporally persistent representation with cross-resolution correspondence between local and global views. Extensive experiments show that our TATS achieves competitive downstream (action recognition and video retrieval) performances on benchmarks (UCF101 and HMDB51).

    Original languageEnglish
    Title of host publicationAVSS 2022 - 18th IEEE International Conference on Advanced Video and Signal-Based Surveillance
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    ISBN (Electronic)9781665463829
    DOIs
    Publication statusPublished - 2022
    Event18th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2022 - Virtual, Online, Spain
    Duration: 2022 Nov 292022 Dec 2

    Publication series

    NameAVSS 2022 - 18th IEEE International Conference on Advanced Video and Signal-Based Surveillance

    Conference

    Conference18th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2022
    Country/TerritorySpain
    CityVirtual, Online
    Period22/11/2922/12/2

    Bibliographical note

    Funding Information:
    This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program(Korea University), No.B0101-15-0266, Development of High Performance Visual BigData Discovery Platform for Large-Scale Realtime Data Analysis).

    Publisher Copyright:
    © 2022 IEEE.

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Science Applications
    • Computer Vision and Pattern Recognition
    • Information Systems and Management
    • Media Technology

    Fingerprint

    Dive into the research topics of 'Temporal-Invariant Video Representation Learning with Dynamic Temporal Resolutions'. Together they form a unique fingerprint.

    Cite this