Abstract
TensorExpress provides in-network communication scheduling for distributed deep learning (DDL). In cloud-based DDL, parameter communication over a network is a key bottleneck. Previous studies proposed tensor packet reordering approaches to reduce network blocking time. However, network contention still exists in DDL. TensorExpress mitigates network contention and reduces overall training time. It schedules tensor packets in-network using P4, a switch programming language. TensorExpress improves latency and network blocking time up to 2.5 and 2.44 times, respectively.
Original language | English |
---|---|
Title of host publication | Proceedings - 2020 IEEE 13th International Conference on Cloud Computing, CLOUD 2020 |
Publisher | IEEE Computer Society |
Pages | 25-27 |
Number of pages | 3 |
ISBN (Electronic) | 9781728187808 |
DOIs | |
Publication status | Published - 2020 Oct |
Event | 13th IEEE International Conference on Cloud Computing, CLOUD 2020 - Virtual, Beijing, China Duration: 2020 Oct 18 → 2020 Oct 24 |
Publication series
Name | IEEE International Conference on Cloud Computing, CLOUD |
---|---|
Volume | 2020-October |
ISSN (Print) | 2159-6182 |
ISSN (Electronic) | 2159-6190 |
Conference
Conference | 13th IEEE International Conference on Cloud Computing, CLOUD 2020 |
---|---|
Country/Territory | China |
City | Virtual, Beijing |
Period | 20/10/18 → 20/10/24 |
Bibliographical note
Funding Information:†This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation grant funded by the Korea government (No. 2015-0-00280, (SW Starlab) Next generation cloud infra-software toward the guarantee of performance and security SLA). This research was also supported by National Research Foundation of Korea funded by the Ministry of Science, ICT (No. NRF-2019H1D8A2105513).
Publisher Copyright:
© 2020 IEEE.
Keywords
- Communication scheduling
- Distributed deep learning
- In-network delay
- P4
- Parameter server architecture
ASJC Scopus subject areas
- Artificial Intelligence
- Information Systems
- Software