Prediction of the Resource Consumption of Distributed Deep Learning Systems

Gyeongsik Yang, Changyong Shin, Jeunghwan Lee, Yeonho Yoo, Chuck Yoo

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


The prediction of the resource consumption for the distributed training of deep learning models is of paramount importance, as it can inform a priori users how long their training would take and also enable users to manage the cost of training. Yet, no such prediction is available for users because the resource consumption itself varies significantly according to "settings"such as GPU types and also by "workloads"like deep learning models. Previous studies have aimed to derive or model such a prediction, but they fall short of accommodating the various combinations of settings and workloads together. This study presents Driple that designs graph neural networks to predict the resource consumption of diverse workloads. Driple also designs transfer learning to extend the graph neural networks to adapt to differences in settings. The evaluation results show that Driple can effectively predict a wide range of workloads and settings. At the same time, Driple can efficiently reduce the time required to tailor the prediction for different settings by up to 7.3×.

Original languageEnglish
Article number3530895
JournalProceedings of the ACM on Measurement and Analysis of Computing Systems
Issue number2
Publication statusPublished - 2022 Jun

Bibliographical note

Funding Information:
We thank our shepherd, Sergey Blagodurov, and the anonymous reviewers for their insightful comments that helped us to improve this study. This work was supported by Institute of Information & communications Technology Planning & Evaluation grant funded by the Korea government (Ministry of Science and ICT, MSIT) (2015-0-00280, (SW Starlab) Next generation cloud infra-software toward the guarantee of performance and security SLA). This research was also partly supported by Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Education (NRF-2021R1A6A1A13044830), and a Korea University Grant.

Publisher Copyright:
© 2022 ACM.


  • Distributed deep learning
  • Graph neural networks
  • Resource prediction
  • Training time prediction
  • Transfer learning

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Safety, Risk, Reliability and Quality
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Prediction of the Resource Consumption of Distributed Deep Learning Systems'. Together they form a unique fingerprint.

Cite this