Abstract
Split computing is a promising approach to reduce the inference latency of deep neural network (DNN) models. In this article, we propose a two-phase split computing framework (TSCF). In TSCF, for vertical interlayer splitting between the computing nodes at different levels (e.g., central and edge clouds), a shortest path problem in a directed graph is formulated and a pruning-based low-complexity solution is devised. In addition, for horizontal intralayer splitting between the computing nodes at the same level (e.g., edge clouds), the execution units of a specific layer are further divided and distributed to the computing nodes at the same level proportionally to their available resources. The evaluation results demonstrate that TSCF can reduce inference latency more than 38.8% compared to the traditional interlayer splitting scheme by efficiently using the resources of distributed computing nodes. In addition, it is demonstrated that near-optimal performance in terms of inference latency can be achieved even with a pruning-based low-complexity solution.
| Original language | English |
|---|---|
| Pages (from-to) | 21741-21749 |
| Number of pages | 9 |
| Journal | IEEE Internet of Things Journal |
| Volume | 11 |
| Issue number | 12 |
| DOIs | |
| Publication status | Published - 2024 Jun 15 |
Bibliographical note
Publisher Copyright:© 2014 IEEE.
Keywords
- Deep neural network (DNN)
- inference latency
- interlayer splitting
- intralayer splitting
- two-phase split computing
ASJC Scopus subject areas
- Signal Processing
- Information Systems
- Hardware and Architecture
- Computer Science Applications
- Computer Networks and Communications