TY - GEN
T1 - Implementation of Pipelined Adder Tree for Long Short-Term Memory Cells
AU - Kim, Seok Young
AU - Kim, Chang Hyun
AU - Kim, Seon Wook
N1 - Funding Information:
This work was supported in part by SK Hynix Inc.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/6/27
Y1 - 2021/6/27
N2 - The characteristics of recurrent and functional reduction in RNNs (Recurrent Neural Networks) may cause an inefficient execution in conventional hardware such as CPU (Central Processing Unit) and GPU (Graphics Processing Unit). Their large number of recurrent data prevents the CPU from exploiting their locality, while the reduction feature limits the GPU parallelism. As a result, several dedicated hardware for efficient RNN execution has been proposed recently. In this paper, we propose a processing element that performs primary operations (i.e., multiplication and accumulation operation and activation function operation) of LSTM (Long Short-Term Memory), the most dominant model of RNN We implemented the processing element on an FPGA (Field-Programmable Gate Array) and verified the accuracy by measuring the BLEU score (Bilingual Evaluation Understudy score) of the PyTorch-based LSTM application. It was decreased by 2.3% compared to the CPU result.
AB - The characteristics of recurrent and functional reduction in RNNs (Recurrent Neural Networks) may cause an inefficient execution in conventional hardware such as CPU (Central Processing Unit) and GPU (Graphics Processing Unit). Their large number of recurrent data prevents the CPU from exploiting their locality, while the reduction feature limits the GPU parallelism. As a result, several dedicated hardware for efficient RNN execution has been proposed recently. In this paper, we propose a processing element that performs primary operations (i.e., multiplication and accumulation operation and activation function operation) of LSTM (Long Short-Term Memory), the most dominant model of RNN We implemented the processing element on an FPGA (Field-Programmable Gate Array) and verified the accuracy by measuring the BLEU score (Bilingual Evaluation Understudy score) of the PyTorch-based LSTM application. It was decreased by 2.3% compared to the CPU result.
KW - Bfloat16
KW - Floating-point MAC unit
KW - Long Short-Term Memory
KW - Recurrent Neural Networks
UR - http://www.scopus.com/inward/record.url?scp=85117080171&partnerID=8YFLogxK
U2 - 10.1109/ITC-CSCC52171.2021.9553004
DO - 10.1109/ITC-CSCC52171.2021.9553004
M3 - Conference contribution
AN - SCOPUS:85117080171
T3 - 2021 36th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2021
BT - 2021 36th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 36th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2021
Y2 - 27 June 2021 through 30 June 2021
ER -