TY - GEN
T1 - TensorExpress
T2 - 13th IEEE International Conference on Cloud Computing, CLOUD 2020
AU - Kang, Minkoo
AU - Yang, Gyeongsik
AU - Yoo, Yeonho
AU - Yoo, Chuck
N1 - Funding Information:
†This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation grant funded by the Korea government (No. 2015-0-00280, (SW Starlab) Next generation cloud infra-software toward the guarantee of performance and security SLA). This research was also supported by National Research Foundation of Korea funded by the Ministry of Science, ICT (No. NRF-2019H1D8A2105513).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - TensorExpress provides in-network communication scheduling for distributed deep learning (DDL). In cloud-based DDL, parameter communication over a network is a key bottleneck. Previous studies proposed tensor packet reordering approaches to reduce network blocking time. However, network contention still exists in DDL. TensorExpress mitigates network contention and reduces overall training time. It schedules tensor packets in-network using P4, a switch programming language. TensorExpress improves latency and network blocking time up to 2.5 and 2.44 times, respectively.
AB - TensorExpress provides in-network communication scheduling for distributed deep learning (DDL). In cloud-based DDL, parameter communication over a network is a key bottleneck. Previous studies proposed tensor packet reordering approaches to reduce network blocking time. However, network contention still exists in DDL. TensorExpress mitigates network contention and reduces overall training time. It schedules tensor packets in-network using P4, a switch programming language. TensorExpress improves latency and network blocking time up to 2.5 and 2.44 times, respectively.
KW - Communication scheduling
KW - Distributed deep learning
KW - In-network delay
KW - P4
KW - Parameter server architecture
UR - http://www.scopus.com/inward/record.url?scp=85098592953&partnerID=8YFLogxK
U2 - 10.1109/CLOUD49709.2020.00014
DO - 10.1109/CLOUD49709.2020.00014
M3 - Conference contribution
AN - SCOPUS:85098592953
T3 - IEEE International Conference on Cloud Computing, CLOUD
SP - 25
EP - 27
BT - Proceedings - 2020 IEEE 13th International Conference on Cloud Computing, CLOUD 2020
PB - IEEE Computer Society
Y2 - 18 October 2020 through 24 October 2020
ER -