Abstract
Distributed machine learning is an effective method to alleviate intensive computation costs of training; however it suffers from network bottlenecks while gathering local results. Recent advent of programmable data planes opened a new avenue, in-network aggregation, which executes gradient aggregations in the middle of the network resolving network bottlenecks and further accelerates distributed machine learning. However, due to resource-constrained features of current programmable data planes, installation of in-network aggregation functionalities throughout the network would impose unacceptable burden, posing a need for sophisticated deployment. In this paper, we consider a problem of deploying in-network aggregation functionalities, so as to minimize the total network traffic in multi-tenant distributed machine learning. Since the formulated problem is an integer linear programming problem, which is known as NP-hard, we propose a traffic aware placement of in-network aggregation (TAPINA) algorithm with lower complexity and near-optimal performance. TAPINA decides aggregation points of multiple tenants sequentially in order of their expected traffics and reuses the already selected aggregation points by other tenants to reduce the overall deployment cost. Simulation results demonstrate that TAPINA shows near-optimal performance, achieving up to 20 % traffic reduction compared to the state-of-the-art algorithm in most cases.
Original language | English |
---|---|
Title of host publication | ICCCN 2023 - 2023 32nd International Conference on Computer Communications and Networks |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9798350336184 |
DOIs | |
Publication status | Published - 2023 |
Event | 32nd International Conference on Computer Communications and Networks, ICCCN 2023 - Honolulu, United States Duration: 2023 Jul 24 → 2023 Jul 27 |
Publication series
Name | Proceedings - International Conference on Computer Communications and Networks, ICCCN |
---|---|
Volume | 2023-July |
ISSN (Print) | 1095-2055 |
Conference
Conference | 32nd International Conference on Computer Communications and Networks, ICCCN 2023 |
---|---|
Country/Territory | United States |
City | Honolulu |
Period | 23/7/24 → 23/7/27 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Distributed Machine Learning
- In-Network Aggregation
- Programmable data plane
ASJC Scopus subject areas
- Computer Networks and Communications
- Hardware and Architecture
- Software