Pipelined Stochastic Gradient Descent with Taylor Expansion

Bongwon Jang, Inchul Yoo, Dongsuk Yook

    Research output: Contribution to journalArticlepeer-review

    2 Citations (Scopus)

    Abstract

    Stochastic gradient descent (SGD) is an optimization method typically used in deep learning to train deep neural network (DNN) models. In recent studies for DNN training, pipeline parallelism, a type of model parallelism, is proposed to accelerate SGD training. However, since SGD is inherently sequential, naively implemented pipeline parallelism introduces the weight inconsistency and the delayed gradient problems, resulting in reduced training efficiency. In this study, we propose a novel method called TaylorPipe to alleviate these problems. The proposed method generates multiple model replicas to solve the weight inconsistency problem, and adopts a Taylor expansion-based gradient prediction algorithm to mitigate the delayed gradient problem. We verified the efficiency of the proposed method using the VGG-16 and the ResNet-34 on the CIFAR-10 and CIFAR-100 datasets. The experimental results show that not only the training time is reduced by up to 2.7 times but also the accuracy of TaylorPipe is comparable with that of SGD.

    Original languageEnglish
    Article number11730
    JournalApplied Sciences (Switzerland)
    Volume13
    Issue number21
    DOIs
    Publication statusPublished - 2023 Nov

    Bibliographical note

    Publisher Copyright:
    © 2023 by the authors.

    Keywords

    • deep learning
    • parallel processing
    • pipeline processing
    • stochastic gradient descent (SGD)

    ASJC Scopus subject areas

    • General Materials Science
    • Instrumentation
    • General Engineering
    • Process Chemistry and Technology
    • Computer Science Applications
    • Fluid Flow and Transfer Processes

    Fingerprint

    Dive into the research topics of 'Pipelined Stochastic Gradient Descent with Taylor Expansion'. Together they form a unique fingerprint.

    Cite this