Masked Kinematic Continuity-aware Hierarchical Attention Network for pose estimation in videos

Kyung Min Jin, Gun Hee Lee, Woo Jeoung Nam, Tae Kyung Kang, Hyun Woo Kim, Seong Whan Lee

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Existing methods for estimating human poses from video content exploit the temporal features of the video sequences and have shown impressive results. However, most methods address spatiotemporal issues separately. They compromise on accuracy to reduce jitter, or require high-resolution images to deal with occlusion, preventing full consideration of temporal features. Unfortunately, these two issues are interrelated. For example, occlusion causes uncertainty between successive frames, leading to unsmoothed results. To address these issues, we propose the Masked Kinematic Continuity-aware Hierarchical Attention Network (M-HANet) as a novel framework that exploits masked kinematic keypoint features by extending our framework HANet framework. First, we randomly select and mask a keypoint to treat the masked keypoint as it is occluded, which allows us to make the network resilient to occlusion. We also use the velocity and acceleration of each individual keypoint to effectively capture temporal features. Second, the proposed hierarchical transformer encoder refines a 2D or 3D input pose derived from existing estimators by aggregating the masked continuity of the spatiotemporal dependencies of human motion. Finally, to facilitate collaborative optimization, we perform an online cross-supervision between the final pose from our decoder and the refined input pose produced by our encoder. We validate the effectiveness of our model demonstrating that our proposed approach improves PCK@0.05 by 14.1% and MPJPE by 8.7 mm compared to the existing method on a variety of tasks, including 2D and 3D pose estimation, body mesh recovery, and sparsely annotated multi-human pose estimation.

Original languageEnglish
Pages (from-to)282-292
Number of pages11
JournalNeural Networks
Publication statusPublished - 2024 Jan

Bibliographical note

Publisher Copyright:
© 2023 Elsevier Ltd


  • Body mesh recovery
  • Pose estimation
  • Transformer
  • Video understanding

ASJC Scopus subject areas

  • Cognitive Neuroscience
  • Artificial Intelligence


Dive into the research topics of 'Masked Kinematic Continuity-aware Hierarchical Attention Network for pose estimation in videos'. Together they form a unique fingerprint.

Cite this