OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos

Kyung Min Jin, Gun Hee Lee, Seong Whan Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)


Although many approaches for multi-human pose estimation in videos have shown profound results, they require densely annotated data which entails excessive man labor. Furthermore, there exists occlusion and motion blur that inevitably lead to poor estimation performance. To address these problems, we propose a method that leverages an attention mask for occluded joints and encodes temporal dependency between frames using transformers. First, our framework composes different combinations of sparsely annotated frames that denote the track of the overall joint movement. We propose an occlusion attention mask from these combinations that enable encoding occlusion-aware heatmaps as a semi-supervised task. Second, the proposed temporal encoder employs transformer architecture to effectively aggregate the temporal relationship and keypoint-wise attention from each time step and accurately refines the target frame's final pose estimation. We achieve state-of-the-art pose estimation results for PoseTrack2017 and PoseTrack2018 datasets and demonstrate the robustness of our approach to occlusion and motion blur in sparsely annotated video data.

Original languageEnglish
Title of host publication2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781665452588
Publication statusPublished - 2022
Event2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Prague, Czech Republic
Duration: 2022 Oct 92022 Oct 12

Publication series

NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
ISSN (Print)1062-922X


Conference2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022
Country/TerritoryCzech Republic

Bibliographical note

Funding Information:
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. B0101-15-0266, Development of High Performance Visual BigData Discovery Platform for Large-Scale Realtime Data Analysis, No. 2021-0-02068, Artificial Intelligence Innovation Hub).

Publisher Copyright:
© 2022 IEEE.


  • motion blur
  • multi human pose estimation in video
  • occlusion
  • transformer.

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Human-Computer Interaction


Dive into the research topics of 'OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos'. Together they form a unique fingerprint.

Cite this