Scale-CIM: Precision-scalable computing-in-memory for energy-efficient quantized neural networks

Young Seo Lee, Young Ho Gong, Sung Woo Chung

Research output: Contribution to journalArticlepeer-review

Abstract

Quantized neural networks (QNNs), which perform multiply-accumulate (MAC) operations with low-precision weights or activations, have been widely exploited to reduce energy consumption. QNNs usually have a trade-off between energy consumption and accuracy depending on the quantized precision, so that it is necessary to select an appropriate precision for energy efficiency. Nevertheless, the conventional hardware accelerators such as Google TPU are typically designed and optimized for a specific precision (e.g., 8-bit), which may degrade energy efficiency for other precisions. Though an analog-based computing-in-memory (CIM) technology supporting variable precision has been proposed to improve energy efficiency, its implementation requires extremely large and power-consuming analog-to-digital converters (ADCs). In this paper, we propose Scale-CIM, a precision-scalable CIM architecture which supports MAC operations based on digital computations (not analog computations). Scale-CIM performs binary MAC operations with high parallelism, by executing digital-based multiplication operations in the CIM array and accumulation operations in the peripheral logic. In addition, Scale-CIM supports multi-bit MAC operations without ADCs, based on the binary MAC operations and shift operations depending on the precision. Since Scale-CIM fully utilizes the CIM array for various quantized precisions (not for a specific precision), it achieves high compute-throughput. Consequently, Scale-CIM enables precision-scalable CIM-based MAC operations with high parallelism. Our simulation results show that Scale-CIM achieves 1.5∼15.8 × speedup and reduces system energy consumption by 53.7∼95.7% across different quantized precisions, compared to the state-of-the-art precision-scalable accelerator.

Original languageEnglish
Article number102787
JournalJournal of Systems Architecture
Volume134
DOIs
Publication statusPublished - 2023 Jan

Bibliographical note

Funding Information:
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2020R1A2C2003500 , No. 2020R1A6A3A13064398 , and No. 2020R1G1A1100040 ), and Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2022-0-00441-001 , Memory-Centric Architecture Using the Reconfigurable PIM Devices).

Publisher Copyright:
© 2022 Elsevier B.V.

Keywords

  • Digital-based computing-in-memory
  • Precision-scalable computation
  • Quantized neural networks

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Scale-CIM: Precision-scalable computing-in-memory for energy-efficient quantized neural networks'. Together they form a unique fingerprint.

Cite this