Multicast on-chip communication is encountered in various cache-coherence protocols targeting multi-core processors, and its pervasiveness is increasing due to the proliferation of machine learning accelerators. In-network handling of multicast traffic imposes additional switching-level restrictions to guarantee deadlock freedom, while it stresses the allocation efficiency of Network-on-Chip (NoC) routers. In this work, we propose a novel partitioned NoC router microarchitecture, called SmartFork, which employs a versatile and cost-efficient multicast packet replication scheme that allows the design of high-throughput and low-cost NoCs. The design is adapted to the average branch splitting observed in real-world multicast routing algorithms. Compared to state-of-the-art NoC multicast approaches, SmartFork is demonstrated to yield high performance in terms of latency and throughput, while still offering a cost-effective implementation.
Bibliographical notePublisher Copyright:
© 2020 Elsevier B.V.
ASJC Scopus subject areas
- Hardware and Architecture
- Electrical and Electronic Engineering