Proactive congestion avoidance for distributed deep learning

Minkoo Kang, Gyeongsik Yang, Yeonho Yoo, Chuck Yoo

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


This paper presents “Proactive Congestion Notification” (PCN), a congestion-avoidance technique for distributed deep learning (DDL). DDL is widely used to scale out and accelerate deep neural network training. In DDL, each worker trains a copy of the deep learning model with different training inputs and synchronizes the model gradients at the end of each iteration. However, it is well known that the network communication for synchronizing model parameters is the main bottleneck in DDL. Our key observation is that the DDL architecture makes each worker generate burst traffic every iteration, which causes network congestion and in turn degrades the throughput of DDL traffic. Based on this observation, the key idea behind PCN is to prevent potential congestion by proactively regulating the switch queue length before DDL burst traffic arrives at the switch, which prepares the switches for handling incoming DDL bursts. In our evaluation, PCN improves the throughput of DDL traffic by 72% on average.

Original languageEnglish
Article number174
Pages (from-to)1-18
Number of pages18
JournalSensors (Switzerland)
Issue number1
Publication statusPublished - 2021 Jan 1

Bibliographical note

Funding Information:
This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2015-0-00280, (SW Starlab) Next generation cloud infra-software toward the guarantee of performance and security SLA). This research was also supported by Next Generation Engineering Researcher Program of National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (No. NRF-2019H1D8A2105513).

Publisher Copyright:
© 2020 by the authors. Li-censee MDPI, Basel, Switzerland.


  • Congestion avoidance
  • Deep learning
  • Distributed deep learning
  • Network congestion
  • P4
  • Proactive congestion notification

ASJC Scopus subject areas

  • Analytical Chemistry
  • Information Systems
  • Atomic and Molecular Physics, and Optics
  • Biochemistry
  • Instrumentation
  • Electrical and Electronic Engineering


Dive into the research topics of 'Proactive congestion avoidance for distributed deep learning'. Together they form a unique fingerprint.

Cite this