MHCanonNet: Multi-Hypothesis Canonical lifting Network for self-supervised 3D human pose estimation in the wild video

  • Hyun Woo Kim
  • , Gun Hee Lee
  • , Woo Jeoung Nam
  • , Kyung Min Jin
  • , Tae Kyung Kang
  • , Geon Jun Yang
  • , Seong Whan Lee*
  • *Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Recent advancements in 3D Human Pose Estimation using fully-supervised learning approach have shown impressive results; however, these methods heavily rely on large amounts of annotated 3D data, which are challenging to obtain outside controlled laboratory environments. Therefore, in this study, we propose a new self-supervised training method designed to train a 3D human pose estimation network using unlabeled multi-view images. The model trains relative depths between joints without any 3D annotation by satisfying multi-view consistency constraints from unlabeled multi-view videos without camera calibration, while simultaneously learning representations of multiple plausible pose hypotheses. For this reason, we call our proposed network a Multi-Hypothesis Canonical Lifting Network (MHCanonNet). By enriching the diversity of extracted features and keeping various possibilities open, our network accurately estimates the final 3D pose. The key idea lies in the design of a novel and unbiased reconstruction objective function that combines multiple hypotheses from different viewpoints. The proposed approach demonstrates state-of-the-art results not only on two popular benchmark datasets, Human3.6M and MPI-INF-3DHP but also on an in-the-wild dataset, Ski-Pose, surpassing existing self-supervised training methods.

    Original languageEnglish
    Article number109908
    JournalPattern Recognition
    Volume145
    DOIs
    Publication statusPublished - 2024 Jan

    Bibliographical note

    Publisher Copyright:
    © 2023 Elsevier Ltd

    Keywords

    • 3D human pose
    • Multi-view geometry
    • Self-supervised learning

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Computer Vision and Pattern Recognition
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'MHCanonNet: Multi-Hypothesis Canonical lifting Network for self-supervised 3D human pose estimation in the wild video'. Together they form a unique fingerprint.

    Cite this