TY - GEN
T1 - SmartFork
T2 - 52nd IEEE International Symposium on Circuits and Systems, ISCAS 2020
AU - Konstantinou, Dimitrios
AU - Nicopoulos, Chrysostomos
AU - Lee, Junghee
AU - Sirakoulis, Georgios Ch
AU - Dimitrakopoulos, Giorgos
N1 - Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - Multicast on-chip communication is encountered in various cache-coherence protocols targeting multi-core processors, and its pervasiveness is increasing due to the proliferation of machine learning accelerators. In-network handling of multicast traffic imposes additional switching-level restrictions to guarantee deadlock freedom, while it stresses the allocation efficiency of Network-on-Chip (NoC) routers. In this work, we propose a novel NoC router microarchitecture, called SmartFork, which employs a versatile and cost-efficient multicast packet replication scheme that allows the design of high-throughput and low-cost NoCs. The design is adapted to the average branch splitting observed in real-world multicast routing algorithms. Compared to state-of-the-art NoC multicast approaches, SmartFork is demonstrated to yield higher performance in terms of latency and throughput, while still offering a cost-effective implementation.
AB - Multicast on-chip communication is encountered in various cache-coherence protocols targeting multi-core processors, and its pervasiveness is increasing due to the proliferation of machine learning accelerators. In-network handling of multicast traffic imposes additional switching-level restrictions to guarantee deadlock freedom, while it stresses the allocation efficiency of Network-on-Chip (NoC) routers. In this work, we propose a novel NoC router microarchitecture, called SmartFork, which employs a versatile and cost-efficient multicast packet replication scheme that allows the design of high-throughput and low-cost NoCs. The design is adapted to the average branch splitting observed in real-world multicast routing algorithms. Compared to state-of-the-art NoC multicast approaches, SmartFork is demonstrated to yield higher performance in terms of latency and throughput, while still offering a cost-effective implementation.
UR - http://www.scopus.com/inward/record.url?scp=85109338908&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85109338908
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - 2020 IEEE International Symposium on Circuits and Systems, ISCAS 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 October 2020 through 21 October 2020
ER -