Convolutional neural networks (CNNs) are one of the most popular machine learning algorithms. The convolutional layers, which account for the most execution time of CNNs, are implemented with matrix multiplication because the convolution operation performs dot products between filters and local regions of the input. On the other hand, GPUs with thousands of cores were proven to significantly accelerate matrix multiplication, compared to CPUs with a limited number of cores, especially for large matrices. However, the current memory architecture allows only one row access at a time so that multiple accesses are necessary to read the column data of the second matrix, thus slowing down matrix multiplication. In this study, we adopt the monolithic 3-D integration for the GPU scratchpad memory, called monolithic 3-D integration (M3D) scratchpad memory (SPM), to enhance matrix multiplication. The M3D SPM allows one access to read the column data of the second matrix, similar to the case of the first matrix. The simulation results show that our M3D SPM improves the system performance by 46.3% for the 32 × 32 matrix multiplication, over the conventional 2-D SPM where the column data of the second matrix are read sequentially.
Bibliographical noteFunding Information:
Manuscript received April 10, 2020; accepted May 10, 2020. Date of publication June 12, 2020; date of current version May 27, 2021. This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIT) under Grant 2020R1A2C2003500, in part Samsung Electronics, Korea University, and in part by the Hanoi University of Science and Technology. This manuscript was recommended for publication by J. Hu. (Corresponding authors: Sung Woo Chung; Cheol Hong Kim.) Cong Thuan Do was with the Department of Computer Science and Engineering, Korea University, Seoul 02841, South Korea. He is now with the School of Information and Communications Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam (e-mail: firstname.lastname@example.org).
© 2009-2012 IEEE.
- High performance
- matrix multiplication
- monolithic 3-D
- neural network
- scratchpad memory (SPM)
ASJC Scopus subject areas
- Control and Systems Engineering
- Computer Science(all)