TY - GEN
T1 - Design of floating-point MAC unit for computing DNN applications in PIM
AU - Lee, Hun Jae
AU - Kim, Chang Hyun
AU - Kim, Seon Wook
N1 - Funding Information:
This paper was result of the research project supported by SK hynix Inc.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/1
Y1 - 2020/1
N2 - Deep learning models are generally trained using a single-precision floating-point number system. However, in the inference process, they use simpler number systems like integers and fixed-points, because of their small design area and low power consumption, despite the accuracy loss and quantization parameter overhead due to their quantization. In general, a floating-point MAC unit makes it unsuitable for inference engines and especially for the area, power, and heat-sensitive devices such as Processor-In-Memory (PIM). In this paper, we propose an efficient MAC design based on the bfloat16 suitable for neural network operations while considering the characteristics of data used for deep learning. Our techniques simplified the design by removing the circuits for handling an underflow, an overflow, and a normalization from the critical path and treating them as exceptions. Also, we improved the computational accuracy by extending the bit-width of the mantissa inside the MAC unit and eliminated unnecessary normalization at every computation. Compared with a MAC unit without our optimization by using the Samsung 65nm library, we reduced the delay of a non-pipelined MAC unit by 47.3%, the area by 9.1 %, and the power consumption by 24.2%, respectively. Furthermore, we show that the proposed bfloat16 MAC outperformed the 16-bit integer MAC in terms of area and power consumption. We also show the design of a 1GHz 3-stage pipelined MAC unit with its performance analysis.
AB - Deep learning models are generally trained using a single-precision floating-point number system. However, in the inference process, they use simpler number systems like integers and fixed-points, because of their small design area and low power consumption, despite the accuracy loss and quantization parameter overhead due to their quantization. In general, a floating-point MAC unit makes it unsuitable for inference engines and especially for the area, power, and heat-sensitive devices such as Processor-In-Memory (PIM). In this paper, we propose an efficient MAC design based on the bfloat16 suitable for neural network operations while considering the characteristics of data used for deep learning. Our techniques simplified the design by removing the circuits for handling an underflow, an overflow, and a normalization from the critical path and treating them as exceptions. Also, we improved the computational accuracy by extending the bit-width of the mantissa inside the MAC unit and eliminated unnecessary normalization at every computation. Compared with a MAC unit without our optimization by using the Samsung 65nm library, we reduced the delay of a non-pipelined MAC unit by 47.3%, the area by 9.1 %, and the power consumption by 24.2%, respectively. Furthermore, we show that the proposed bfloat16 MAC outperformed the 16-bit integer MAC in terms of area and power consumption. We also show the design of a 1GHz 3-stage pipelined MAC unit with its performance analysis.
KW - Bfloat16
KW - Deep Neural Network
KW - Exception
KW - Floating-point MAC unit
KW - Normalization
UR - http://www.scopus.com/inward/record.url?scp=85083504901&partnerID=8YFLogxK
U2 - 10.1109/ICEIC49074.2020.9050989
DO - 10.1109/ICEIC49074.2020.9050989
M3 - Conference contribution
AN - SCOPUS:85083504901
T3 - 2020 International Conference on Electronics, Information, and Communication, ICEIC 2020
BT - 2020 International Conference on Electronics, Information, and Communication, ICEIC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 International Conference on Electronics, Information, and Communication, ICEIC 2020
Y2 - 19 January 2020 through 22 January 2020
ER -