TY - GEN
T1 - Summarizer
T2 - 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017
AU - Koo, Gunjae
AU - Matam, Kiran Kumar
AU - Te, I.
AU - Narra, H. V.Krishna Giri
AU - Li, Jing
AU - Tseng, Hung Wei
AU - Swanson, Steven
AU - Annavaram, Murali
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/10/14
Y1 - 2017/10/14
N2 - Modern data center solid state drives (SSDs) integrate multiple general-purpose embedded cores to manage flash translation layer, garbage collection, wear-leveling, and etc., to improve the performance and the reliability of SSDs. As the performance of these cores steadily improves there are opportunities to repurpose these cores to perform application driven computations on stored data, with the aim of reducing the communication between the host processor and the SSD. Reducing host-SSD bandwidth demand cuts down the I/O time which is a bottleneck for many applications operating on large data sets. However, the embedded core performance is still significantly lower than the host processor, as generally wimpy embedded cores are used within SSD for cost effective reasons. So there is a trade-off between the computation overhead associated with near SSD processing and the reduction in communication overhead to the host system. In this work, we design a set of application programming interfaces (APIs) that can be used by the host application to offload a data intensive task to the SSD processor. We describe how these APIs can be implemented by simple modifications to the existing Non-Volatile Memory Express (NVMe) command interface between the host and the SSD processor. We then quantify the computation versus communication tradeoffs for near storage computing using applications from two important domains, namely data analytics and data integration. Using a fully functional SSD evaluation platform we perform design space exploration of our proposed approach by varying the bandwidth and computation capabilities of the SSD processor. We evaluate static and dynamic approaches for dividing the work between the host and SSD processor, and show that our design may improve the performance by up to 20% when compared to processing at the host processor only, and 6× when compared to processing at the SSD processor only.
AB - Modern data center solid state drives (SSDs) integrate multiple general-purpose embedded cores to manage flash translation layer, garbage collection, wear-leveling, and etc., to improve the performance and the reliability of SSDs. As the performance of these cores steadily improves there are opportunities to repurpose these cores to perform application driven computations on stored data, with the aim of reducing the communication between the host processor and the SSD. Reducing host-SSD bandwidth demand cuts down the I/O time which is a bottleneck for many applications operating on large data sets. However, the embedded core performance is still significantly lower than the host processor, as generally wimpy embedded cores are used within SSD for cost effective reasons. So there is a trade-off between the computation overhead associated with near SSD processing and the reduction in communication overhead to the host system. In this work, we design a set of application programming interfaces (APIs) that can be used by the host application to offload a data intensive task to the SSD processor. We describe how these APIs can be implemented by simple modifications to the existing Non-Volatile Memory Express (NVMe) command interface between the host and the SSD processor. We then quantify the computation versus communication tradeoffs for near storage computing using applications from two important domains, namely data analytics and data integration. Using a fully functional SSD evaluation platform we perform design space exploration of our proposed approach by varying the bandwidth and computation capabilities of the SSD processor. We evaluate static and dynamic approaches for dividing the work between the host and SSD processor, and show that our design may improve the performance by up to 20% when compared to processing at the host processor only, and 6× when compared to processing at the SSD processor only.
KW - Dynamic workload offloading
KW - Near data processing
KW - SSD
KW - Storage systems
UR - http://www.scopus.com/inward/record.url?scp=85034032987&partnerID=8YFLogxK
U2 - 10.1145/3123939.3124553
DO - 10.1145/3123939.3124553
M3 - Conference contribution
AN - SCOPUS:85034032987
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 219
EP - 231
BT - MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings
PB - IEEE Computer Society
Y2 - 14 October 2017 through 18 October 2017
ER -