Pipelined Stochastic Gradient Descent with Taylor Expansion

Bongwon Jang, Inchul Yoo, Dongsuk Yook

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Stochastic gradient descent (SGD) is an optimization method typically used in deep learning to train deep neural network (DNN) models. In recent studies for DNN training, pipeline parallelism, a type of model parallelism, is proposed to accelerate SGD training. However, since SGD is inherently sequential, naively implemented pipeline parallelism introduces the weight inconsistency and the delayed gradient problems, resulting in reduced training efficiency. In this study, we propose a novel method called TaylorPipe to alleviate these problems. The proposed method generates multiple model replicas to solve the weight inconsistency problem, and adopts a Taylor expansion-based gradient prediction algorithm to mitigate the delayed gradient problem. We verified the efficiency of the proposed method using the VGG-16 and the ResNet-34 on the CIFAR-10 and CIFAR-100 datasets. The experimental results show that not only the training time is reduced by up to 2.7 times but also the accuracy of TaylorPipe is comparable with that of SGD.

Original languageEnglish
Article number11730
JournalApplied Sciences (Switzerland)
Volume13
Issue number21
DOIs
Publication statusPublished - 2023 Nov

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

Keywords

  • deep learning
  • parallel processing
  • pipeline processing
  • stochastic gradient descent (SGD)

ASJC Scopus subject areas

  • General Materials Science
  • Instrumentation
  • General Engineering
  • Process Chemistry and Technology
  • Computer Science Applications
  • Fluid Flow and Transfer Processes

Fingerprint

Dive into the research topics of 'Pipelined Stochastic Gradient Descent with Taylor Expansion'. Together they form a unique fingerprint.

Cite this