Optimization of Microarchitecture and Dataflow for Sparse Tensor CNN Acceleration

Ngoc Son Pham, Taeweon Suh

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


The inherent sparsity present in convolutional neural networks (CNNs) offers a valuable opportunity to significantly decrease the computational workload during inference. Nevertheless, leveraging unstructured sparsity typically comes with the trade-off of increased complexity or substantial hardware overheads for accelerators. To address these challenges, this research introduces an innovative inner join aimed at effectively reducing the size and power consumption of the sparsity-handling circuit. Additionally, a novel dataflow named Channel Stacking of Sparse Tensors (CSSpa) is presented, focusing on maximizing data reuse to minimize memory accesses - an aspect that significantly contributes to overall power consumption. Through comprehensive simulations, CSSpa demonstrates a 1.6× speedup and a 5.6× reduction in SRAM accesses when executing inference on the ResNet50 model, compared to the existing Sparten architecture. Furthermore, the implementation results reveal a notable 2.32× enhancement in hardware resource efficiency and a 3.3× improvement in energy efficiency compared to Sparten.

Original languageEnglish
Pages (from-to)108818-108832
Number of pages15
JournalIEEE Access
Publication statusPublished - 2023

Bibliographical note

Publisher Copyright:
© 2013 IEEE.


  • AI accelerator
  • convolutional neural networks (CNNs)
  • data compression
  • dataflow
  • network on a chip (NoC)

ASJC Scopus subject areas

  • General Engineering
  • General Materials Science
  • General Computer Science


Dive into the research topics of 'Optimization of Microarchitecture and Dataflow for Sparse Tensor CNN Acceleration'. Together they form a unique fingerprint.

Cite this