Accelerating Convolutional Neural Network Inference in Split Computing: An In-Network Computing Approach

Hochan Lee*, Haneul Ko, Chanbin Bae, Sangheon Pack

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    2 Citations (Scopus)

    Abstract

    Since the latest deep neural network (DNN) models are complex and have many layers, processing an entire DNN model on mobile devices is challenging. To cope with this challenge, a split computing (SC) approach has been proposed, which divides a DNN model into multiple layers and distributes them to mobile devices and edge servers. On the other hand, in-network computing (INC) is a promising technology that offloads computational tasks to network devices (e.g., programmable switches) and thus provides low latency and line-rate packet processing. Although the switch cannot directly process complex DNN models due to its limited computing and memory resources, it has the potential to process specific layers that require simple arithmetic operations. For example, processing the max-pooling layer of convolutional neural network (CNN) models can be offloaded to the switch. In this paper, we consider a network where there are three types of computing nodes: mobile device, edge servers, and switches, and formulate the problem of placing the layers of the CNN model on the computing nodes to minimize the inference latency considering the resource constraints of computing nodes. Then, we derive the optimal results by solving the formulated optimization problem. Evaluation results demonstrate that the optimal results show lower inference latency than a random layer placement scheme and a server-only placement scheme.

    Original languageEnglish
    Title of host publication38th International Conference on Information Networking, ICOIN 2024
    PublisherIEEE Computer Society
    Pages773-776
    Number of pages4
    ISBN (Electronic)9798350330946
    DOIs
    Publication statusPublished - 2024
    Event38th International Conference on Information Networking, ICOIN 2024 - Hybrid, Ho Chi Minh City, Viet Nam
    Duration: 2024 Jan 172024 Jan 19

    Publication series

    NameInternational Conference on Information Networking
    ISSN (Print)1976-7684

    Conference

    Conference38th International Conference on Information Networking, ICOIN 2024
    Country/TerritoryViet Nam
    CityHybrid, Ho Chi Minh City
    Period24/1/1724/1/19

    Bibliographical note

    Publisher Copyright:
    © 2024 IEEE.

    Keywords

    • In-network computing
    • Split computing

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems

    Fingerprint

    Dive into the research topics of 'Accelerating Convolutional Neural Network Inference in Split Computing: An In-Network Computing Approach'. Together they form a unique fingerprint.

    Cite this