Since a large amount of training data is typically needed to train Deep Neural Networks (DNNs), a parallel training approach is required to train the DNNs. The Stochastic Gradient Descent (SGD) algorithm is one of the most widely used methods to train the DNNs. However, since the SGD is an inherently sequential process, it requires some sort of approximation schemes to parallelize the SGD algorithm. In this paper, we review various efforts on parallelizing the SGD algorithm, and analyze the computational overhead, communication overhead, and the effects of the approximations.
Bibliographical notePublisher Copyright:
Copyright © 2020 The Acoustical Society of Korea.
- Deep Neural Network (DNN)
- Deep learning
- Parallel processing
- Stochastic Gradient Descent (SGD)
ASJC Scopus subject areas
- Signal Processing
- Acoustics and Ultrasonics
- Applied Mathematics
- Speech and Hearing