TY - JOUR
T1 - MetricUNet
T2 - Synergistic image- and voxel-level learning for precise prostate segmentation via online sampling
AU - He, Kelei
AU - Lian, Chunfeng
AU - Adeli, Ehsan
AU - Huo, Jing
AU - Gao, Yang
AU - Zhang, Bing
AU - Zhang, Junfeng
AU - Shen, Dinggang
N1 - Funding Information:
This work was supported by the grant from National Key R&D Program of China (No. 2019YFC0118300 ). This work was also supported by the grand from Provincial Key R&D Program of Jiangsu (Nos. BE2020620, BE2020723, BE2018610).
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/7
Y1 - 2021/7
N2 - Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmentation. This problem becomes more serious in CT prostate segmentation as CT images are usually of low tissue contrast. To address this problem, we propose a two-stage framework, with the first stage to quickly localize the prostate region, and the second stage to precisely segment the prostate by a multi-task UNet architecture. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Therefore, the proposed network has a dual-branch architecture that tackles two tasks: (1) a segmentation sub-network aiming to generate the prostate segmentation, and (2) a voxel-metric learning sub-network aiming to improve the quality of the learned feature space supervised by a metric loss. Specifically, the voxel-metric learning sub-network samples tuples (including triplets and pairs) in voxel-level through the intermediate feature maps. Unlike conventional deep metric learning methods that generate triplets or pairs in image-level before the training phase, our proposed voxel-wise tuples are sampled in an online manner and operated in an end-to-end fashion via multi-task learning. To evaluate the proposed method, we implement extensive experiments on a real CT image dataset consisting 339 patients. The ablation studies show that our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss. And the comparisons show that the proposed method outperforms the state-of-the-art methods by a reasonable margin.
AB - Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmentation. This problem becomes more serious in CT prostate segmentation as CT images are usually of low tissue contrast. To address this problem, we propose a two-stage framework, with the first stage to quickly localize the prostate region, and the second stage to precisely segment the prostate by a multi-task UNet architecture. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Therefore, the proposed network has a dual-branch architecture that tackles two tasks: (1) a segmentation sub-network aiming to generate the prostate segmentation, and (2) a voxel-metric learning sub-network aiming to improve the quality of the learned feature space supervised by a metric loss. Specifically, the voxel-metric learning sub-network samples tuples (including triplets and pairs) in voxel-level through the intermediate feature maps. Unlike conventional deep metric learning methods that generate triplets or pairs in image-level before the training phase, our proposed voxel-wise tuples are sampled in an online manner and operated in an end-to-end fashion via multi-task learning. To evaluate the proposed method, we implement extensive experiments on a real CT image dataset consisting 339 patients. The ablation studies show that our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss. And the comparisons show that the proposed method outperforms the state-of-the-art methods by a reasonable margin.
KW - Contrast learning
KW - Fully convolutional networks
KW - Metric learning
KW - Prostate cancer
KW - Sampling
KW - Triplet
UR - http://www.scopus.com/inward/record.url?scp=85103695025&partnerID=8YFLogxK
U2 - 10.1016/j.media.2021.102039
DO - 10.1016/j.media.2021.102039
M3 - Article
C2 - 33831595
AN - SCOPUS:85103695025
SN - 1361-8415
VL - 71
JO - Medical Image Analysis
JF - Medical Image Analysis
M1 - 102039
ER -