Three-stream fusion network for first-person interaction recognition

Ye Ji Kim, Dong Gyu Lee, Seong Whan Lee

    Research output: Contribution to journalArticlepeer-review

    5 Citations (Scopus)

    Abstract

    First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearer's movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. The three-stream architecture captures the characteristics of the target appearance, target motion, and camera ego-motion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion, and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory (LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competing activity recognition methods in first-person videos where considerable camera ego-motion occurs.

    Original languageEnglish
    Article number107279
    JournalPattern Recognition
    Volume103
    DOIs
    Publication statusPublished - 2020 Jul

    Bibliographical note

    Funding Information:
    This work was supported by Institute for Information & Communications Technology Planning & Evaluation( IITP ) grant funded by the Korea government(MSIT) [No. ‪ 2019-0-00079 ‬, Department of Artificial Intelligence, Korea University] and [No. 2014-0-00059 , Development of Predictive Visual Intelligence Technology].

    Publisher Copyright:
    © 2020

    Keywords

    • Camera ego-motion
    • First-person interaction recognition
    • First-person vision
    • Three-stream correlation fusion
    • Three-stream fusion network

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Computer Vision and Pattern Recognition
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Three-stream fusion network for first-person interaction recognition'. Together they form a unique fingerprint.

    Cite this