Abstract
Deep learning (DL)-based recommendation models play an important role in many real-world applications. However, an embedding layer, which is a key part of the DL-based recommendation models, requires sparse memory accesses to a very large memory space followed by the pooling operations (i.e., reduction operations). It makes the system overprovision memory capacity for model deployment. Moreover, with conventional CPU-based architecture, it is difficult to exploit the locality, causing a huge burden for data transfer between the CPU and memory. To resolve this problem, we propose an embedding vector element quantization and compression method to reduce the memory footprint (capacity) required by the embedding tables. In addition, to reduce the amount of data transfer and memory access, we propose near-memory acceleration hardware with an SRAM buffer that stores the frequently accessed embedding vectors. Our quantization and compression method results in compression ratios of 3.95-4.14 for embedding tables in widely used datasets while negligibly affecting the inference accuracy. Our acceleration technique with 3D stacked DRAM memories, which facilitates the near-memory processing in the logic die with high DRAM bandwidth, leads to 4.9 × -5.4 × embedding layer speedup as compared to the 8-core CPU-based execution while reducing the memory energy consumption by 5.9 × -12.1 ×, on average.
| Original language | English |
|---|---|
| Pages (from-to) | 938-951 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Emerging Topics in Computing |
| Volume | 12 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Compression
- embedding table
- inference
- near-memory processing
- personalized recommendation model
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Information Systems
- Human-Computer Interaction
- Computer Science Applications
Fingerprint
Dive into the research topics of 'Near-Memory Computing With Compressed Embedding Table for Personalized Recommendation'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS