Abstract
Although many approaches for multi-human pose estimation in videos have shown profound results, they require densely annotated data which entails excessive man labor. Furthermore, there exists occlusion and motion blur that inevitably lead to poor estimation performance. To address these problems, we propose a method that leverages an attention mask for occluded joints and encodes temporal dependency between frames using transformers. First, our framework composes different combinations of sparsely annotated frames that denote the track of the overall joint movement. We propose an occlusion attention mask from these combinations that enable encoding occlusion-aware heatmaps as a semi-supervised task. Second, the proposed temporal encoder employs transformer architecture to effectively aggregate the temporal relationship and keypoint-wise attention from each time step and accurately refines the target frame's final pose estimation. We achieve state-of-the-art pose estimation results for PoseTrack2017 and PoseTrack2018 datasets and demonstrate the robustness of our approach to occlusion and motion blur in sparsely annotated video data.
Original language | English |
---|---|
Title of host publication | 2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 3255-3260 |
Number of pages | 6 |
ISBN (Electronic) | 9781665452588 |
DOIs | |
Publication status | Published - 2022 |
Event | 2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Prague, Czech Republic Duration: 2022 Oct 9 → 2022 Oct 12 |
Publication series
Name | Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics |
---|---|
Volume | 2022-October |
ISSN (Print) | 1062-922X |
Conference
Conference | 2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 22/10/9 → 22/10/12 |
Bibliographical note
Funding Information:This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. B0101-15-0266, Development of High Performance Visual BigData Discovery Platform for Large-Scale Realtime Data Analysis, No. 2021-0-02068, Artificial Intelligence Innovation Hub).
Publisher Copyright:
© 2022 IEEE.
Keywords
- motion blur
- multi human pose estimation in video
- occlusion
- transformer.
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Control and Systems Engineering
- Human-Computer Interaction