Summarizer: Trading communication with computing near storage

Gunjae Koo, Kiran Kumar Matam, I. Te, H. V.Krishna Giri Narra, Jing Li, Hung Wei Tseng, Steven Swanson, Murali Annavaram

Research output: Chapter in Book/Report/Conference proceedingConference contribution

91 Citations (Scopus)


Modern data center solid state drives (SSDs) integrate multiple general-purpose embedded cores to manage flash translation layer, garbage collection, wear-leveling, and etc., to improve the performance and the reliability of SSDs. As the performance of these cores steadily improves there are opportunities to repurpose these cores to perform application driven computations on stored data, with the aim of reducing the communication between the host processor and the SSD. Reducing host-SSD bandwidth demand cuts down the I/O time which is a bottleneck for many applications operating on large data sets. However, the embedded core performance is still significantly lower than the host processor, as generally wimpy embedded cores are used within SSD for cost effective reasons. So there is a trade-off between the computation overhead associated with near SSD processing and the reduction in communication overhead to the host system. In this work, we design a set of application programming interfaces (APIs) that can be used by the host application to offload a data intensive task to the SSD processor. We describe how these APIs can be implemented by simple modifications to the existing Non-Volatile Memory Express (NVMe) command interface between the host and the SSD processor. We then quantify the computation versus communication tradeoffs for near storage computing using applications from two important domains, namely data analytics and data integration. Using a fully functional SSD evaluation platform we perform design space exploration of our proposed approach by varying the bandwidth and computation capabilities of the SSD processor. We evaluate static and dynamic approaches for dividing the work between the host and SSD processor, and show that our design may improve the performance by up to 20% when compared to processing at the host processor only, and 6× when compared to processing at the SSD processor only.

Original languageEnglish
Title of host publicationMICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings
PublisherIEEE Computer Society
Number of pages13
ISBN (Electronic)9781450349529
Publication statusPublished - 2017 Oct 14
Externally publishedYes
Event50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017 - Cambridge, United States
Duration: 2017 Oct 142017 Oct 18

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
VolumePart F131207
ISSN (Print)1072-4451


Conference50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017
Country/TerritoryUnited States

Bibliographical note

Publisher Copyright:
© 2017 Association for Computing Machinery.


  • Dynamic workload offloading
  • Near data processing
  • SSD
  • Storage systems

ASJC Scopus subject areas

  • Hardware and Architecture


Dive into the research topics of 'Summarizer: Trading communication with computing near storage'. Together they form a unique fingerprint.

Cite this