Cache bypassing is widely employed to alleviate cache contention and pollution in GPUs. However, cache bypassing often puts more pressure on the network-on-chip (NoC) since the bypassed requests need to traverse the NoC to reach the lower-level memories, thus worsening the NoC congestion. In this paper, we propose an aggressive GPU cache bypassing technique (called SC-Table) to alleviate cache contention and pollution. The SC-Table relies on 2-bit saturating counters (SCs) to store the bypass history of warps. Memory requests issued by a warp are allowed to bypass the L1D when the corresponding SC’s value reaches the bypass threshold. In addition, we adopt the monolithic 3D-based NoC (M3D NoC) to provide better NoC throughput and latency. The combination of the SC-Table and the M3D NoC improves GPU performance by 34.6%, on average, over the baseline where there is no cache bypassing and the traditional 2D NoC is adopted.
Bibliographical noteFunding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C2003500, No. 2020R1A6A3A13064398), Samsung Electronics, College of Information, Korea University, and School of Information and Communications Technology, Hanoi University of Science and Technology. We would also like to thank Dr. Young Seo Lee and Mr. Ji Heon Lee for their help with thermal simulation and anonymous reviewers for their helpful feedback.
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
- 3D network-on-chip
- Cache bypassing
ASJC Scopus subject areas
- Theoretical Computer Science
- Information Systems
- Hardware and Architecture