Multi-View Spatial Aggregation Framework for Joint Localization and Segmentation of Organs at Risk in Head and Neck CT Images

Shujun Liang, Kim Han Thung, Dong Nie, Yu Zhang, Dinggang Shen

Research output: Contribution to journalArticlepeer-review

32 Citations (Scopus)


Accurate segmentation of organs at risk (OARs) from head and neck (HN) CT images is crucial for effective HN cancer radiotherapy. However, the existing deep learning methods are often not trained in an end-to-end fashion, i.e., they independently predetermine the regions of target organs before organ segmentation, causing limited information sharing between related tasks and thus leading to suboptimal segmentation results. Furthermore, when conventional segmentation network is used to segment all the OARs simultaneously, the results often favor big OARs over small OARs. Thus, the existing methods often train a specific model for each OAR, ignoring the correlation between different segmentation tasks. To address these issues, we propose a new multi-view spatial aggregation framework for joint localization and segmentation of multiple OARs using HN CT images. The core of our framework is a proposed region-of-interest (ROI)-based fine-grained representation convolutional neural network (CNN), which is used to generate multi-OAR probability maps from each 2D view (i.e., axial, coronal, and sagittal view) of CT images. Specifically, our ROI-based fine-grained representation CNN (1) unifies the OARs localization and segmentation tasks and trains them in an end-to-end fashion, and (2) improves the segmentation results of various-sized OARs via a novel ROI-based fine-grained representation. Our multi-view spatial aggregation framework then spatially aggregates and assembles the generated multi-view multi-OAR probability maps to segment all the OARs simultaneously. We evaluate our framework using two sets of HN CT images and achieve competitive and highly robust segmentation performance for OARs of various sizes.

Original languageEnglish
Article number9007407
Pages (from-to)2794-2805
Number of pages12
JournalIEEE Transactions on Medical Imaging
Issue number9
Publication statusPublished - 2020 Sept

Bibliographical note

Funding Information:
Manuscript received December 20, 2019; revised February 14, 2020; accepted February 17, 2020. Date of publication February 24, 2020; date of current version August 31, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant 61671230 and Grant 61971213, and in part by the Basic and Applied Basic Research Foundation of Guangdong Province under Grant 2019A1515010417. (Corresponding authors: Yu Zhang; Dinggang Shen.) Shujun Liang is with the School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China, also with the Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China, and also with the Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA (e-mail:

Publisher Copyright:
© 1982-2012 IEEE.


  • Image segmentation
  • convolutional neural network
  • deep learning
  • detection
  • head and neck cancer

ASJC Scopus subject areas

  • Software
  • Radiological and Ultrasound Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multi-View Spatial Aggregation Framework for Joint Localization and Segmentation of Organs at Risk in Head and Neck CT Images'. Together they form a unique fingerprint.

Cite this