Monocular Human Depth Estimation Via Pose Estimation

Jinyoung Jun, Jae Han Lee, Chul Lee, Chang Su Kim

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


We propose a novel monocular depth estimator, which improves the prediction accuracy on human regions by utilizing pose information. The proposed algorithm consists of two networks - PoseNet and DepthNet - to estimate keypoint heatmaps and a depth map, respectively. We incorporate the pose information from PoseNet to improve the depth estimation performance of DepthNet. Specifically, we develop the feature blending block, which fuses the features from PoseNet and DepthNet and feeds them into the next layer of DepthNet, to make the networks learn to predict the depths of human regions more accurately. Furthermore, we develop a novel joint training scheme using partially labeled datasets, which balances multiple loss functions effectively by adjusting weights. Experimental results demonstrate that the proposed algorithm can improve depth estimation performance significantly, especially around human regions. For example, the proposed algorithm improves the depth estimation performance on the human regions of ResNet-50 by 2.8% and 7.0% in terms of $\delta _{1}$ and RMSE, respectively, on the proposed HD + P dataset.

Original languageEnglish
Pages (from-to)151444-151457
Number of pages14
JournalIEEE Access
Publication statusPublished - 2021

Bibliographical note

Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) funded by the Government of Korea (MSIT) under Grant NRF-2018R1A2B3003896, Grant NRF-2019R1A2C4069806, and Grant NRF-2021R1A4A1031864.

Publisher Copyright:
© 2013 IEEE.


  • Monocular depth estimation
  • human depth estimation
  • human pose estimation
  • loss rebalancing strategy

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering


Dive into the research topics of 'Monocular Human Depth Estimation Via Pose Estimation'. Together they form a unique fingerprint.

Cite this