Near-Memory Computing With Compressed Embedding Table for Personalized Recommendation

Jeongmin Lim, Young Geun Kim, Sung Woo Chung, Farinaz Koushanfar, Joonho Kong

Research output: Contribution to journalArticlepeer-review

Abstract

Deep learning (DL)-based recommendation models play an important role in many real-world applications. However, an embedding layer, which is a key part of the DL-based recommendation models, requires sparse memory accesses to a very large memory space followed by the pooling operations (i.e., reduction operations). It makes the system overprovision memory capacity for model deployment. Moreover, with conventional CPU-based architecture, it is difficult to exploit the locality, causing a huge burden for data transfer between the CPU and memory. To resolve this problem, we propose an embedding vector element quantization and compression method to reduce the memory footprint (capacity) required by the embedding tables. In addition, to reduce the amount of data transfer and memory access, we propose near-memory acceleration hardware with an SRAM buffer that stores the frequently accessed embedding vectors. Our quantization and compression method results in compression ratios of 3.95–4.14 for embedding tables in widely used datasets while negligibly affecting the inference accuracy. Our acceleration technique with 3D stacked DRAM memories, which facilitates the near-memory processing in the logic die with high DRAM bandwidth, leads to 4.9×–5.4× embedding layer speedup as compared to the 8-core CPU-based execution while reducing the memory energy consumption by 5.9×–12.1×, on average.

Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalIEEE Transactions on Emerging Topics in Computing
DOIs
Publication statusAccepted/In press - 2023

Bibliographical note

Publisher Copyright:
IEEE

Keywords

  • Compression
  • Data transfer
  • Energy consumption
  • Hardware
  • Memory management
  • Quantization (signal)
  • Random access memory
  • Table lookup
  • embedding table
  • inference
  • near-memory processing
  • personalized recommendation model

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Near-Memory Computing With Compressed Embedding Table for Personalized Recommendation'. Together they form a unique fingerprint.

Cite this