TY - JOUR
T1 - Low-overhead inverted LUT design for bounded DNN activation functions on floating-point vector ALUs
AU - Kim, Seok Young
AU - Kim, Chang Hyun
AU - Lee, Won Joon
AU - Park, Il
AU - Kim, Seon Wook
N1 - Funding Information:
This paper was result of the research project supported by SK hynix Inc., and the EDA tool was supported by the IC Design Education Center (IDEC), Korea, South Korea .
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/9
Y1 - 2022/9
N2 - An inference engine uses floating-point numbers to provide high accuracy in deep neural network computing despite its computing resource limitations. However, the computation for non-linear activation functions occurs the performance bottleneck, and we may alleviate it by adopting a lookup table (LUT) method. However, the floating-point number system's characteristic, where intervals between mantissa numbers differ depending on their exponent values, makes it challenging to calculate LUT index values and produce the error-tolerant outputs. This paper proposes a floating-point-based lookup table (FP-LUT) that produces minimal errors and requires negligible hardware cost, especially for vector arithmetic logic units (ALUs), using bfloat16 recently proposed for both inference and training processes. Instead of calculating the index using the function input value, we apply the principle of an inverse function for our design, especially targeting bounded DNN activation functions. We divide a range of function output values linearly by the number of LUT entry sizes and store the corresponding input values in the LUT. Then, we compare the incoming input value with the stored LUT values, find the corresponding address, and convert it into an FP format for the output. We applied our 32-entry FP-LUT to the in-house 8-way bfloat16 MAC unit to support four DNN activation functions: logistic sigmoid, hyper-tangent, soft sign, and ISRU, which incurs only 1.22% and 0.46% of the area and power consumption overhead. Our accuracy analysis shows that with only an entry size of 1/8 compared to state-of-the-art 16-bit fixed-point LUT methods and the small logic overhead, FP-LUT reduces the average errors by 51.8%, 28.4%, 14.4%, and 26.1% in those functions on our test datasets, respectively. Additionally, we show that our scheme satisfies all application-defined accuracy.
AB - An inference engine uses floating-point numbers to provide high accuracy in deep neural network computing despite its computing resource limitations. However, the computation for non-linear activation functions occurs the performance bottleneck, and we may alleviate it by adopting a lookup table (LUT) method. However, the floating-point number system's characteristic, where intervals between mantissa numbers differ depending on their exponent values, makes it challenging to calculate LUT index values and produce the error-tolerant outputs. This paper proposes a floating-point-based lookup table (FP-LUT) that produces minimal errors and requires negligible hardware cost, especially for vector arithmetic logic units (ALUs), using bfloat16 recently proposed for both inference and training processes. Instead of calculating the index using the function input value, we apply the principle of an inverse function for our design, especially targeting bounded DNN activation functions. We divide a range of function output values linearly by the number of LUT entry sizes and store the corresponding input values in the LUT. Then, we compare the incoming input value with the stored LUT values, find the corresponding address, and convert it into an FP format for the output. We applied our 32-entry FP-LUT to the in-house 8-way bfloat16 MAC unit to support four DNN activation functions: logistic sigmoid, hyper-tangent, soft sign, and ISRU, which incurs only 1.22% and 0.46% of the area and power consumption overhead. Our accuracy analysis shows that with only an entry size of 1/8 compared to state-of-the-art 16-bit fixed-point LUT methods and the small logic overhead, FP-LUT reduces the average errors by 51.8%, 28.4%, 14.4%, and 26.1% in those functions on our test datasets, respectively. Additionally, we show that our scheme satisfies all application-defined accuracy.
KW - Activation functions
KW - Bfloat16
KW - Deep neural networks
KW - Lookup table
UR - http://www.scopus.com/inward/record.url?scp=85133870087&partnerID=8YFLogxK
U2 - 10.1016/j.micpro.2022.104592
DO - 10.1016/j.micpro.2022.104592
M3 - Article
AN - SCOPUS:85133870087
SN - 0141-9331
VL - 93
JO - Microprocessors and Microsystems
JF - Microprocessors and Microsystems
M1 - 104592
ER -