TY - GEN
T1 - Stealth ECC
T2 - 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
AU - Lee, Young Seo
AU - Koo, Gunjae
AU - Gong, Young Ho
AU - Chung, Sung Woo
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C2003500 and No. 2020R1A6A3A 13064398), Samsung Electronics, and College of Information, Korea University. Sung Woo Chung and Young-Ho Gong are the co-corresponding authors of this paper.
Publisher Copyright:
© 2022 EDAA.
PY - 2022
Y1 - 2022
N2 - As DRAM process technology scales down and DRAM density continues to grow, DRAM errors have become a primary concern in modern data centers. Typically, data centers have adopted memory systems with a single error correction double error detection (SECDED) code. However, the SECDED code is not sufficient to satisfy DRAM reliability demands as memory systems get more vulnerable. Though the servers in data centers employ strong ECC schemes, such ECC schemes lead to substantial performance and/or storage overhead. In this paper, we propose Stealth ECC, a cost-effective memory protection scheme providing stronger error correctability than the conventional SECDED code, with negligible performance overhead and without storage overhead. Depending on the data-width (either narrow-width or full-width), Stealth ECC adaptively selects ECC schemes. For narrow-width values, Stealth ECC provides multi-bit error correctability by storing more parity bits in MSB side, instead of zeros. Furthermore, with bitwise interleaved data placement between x4 DRAM chips, Stealth ECC is robust to a single DRAM chip error for narrow-width values. On the other hand, for full-width values, Stealth ECC adopts the SECDED code, which maintains DRAM reliability comparable to the conventional SECDED code. As a result, thanks to the reliability improvement of narrow-width values, Stealth ECC enhances overall DRAM reliability, while incurring negligible performance overhead as well as no storage overhead. Our simulation results show that Stealth ECC reduces the probability of system failure (caused by DRAM errors) by 47.9%, on average, with only 0.9% performance overhead compared to the conventional SECDED code.
AB - As DRAM process technology scales down and DRAM density continues to grow, DRAM errors have become a primary concern in modern data centers. Typically, data centers have adopted memory systems with a single error correction double error detection (SECDED) code. However, the SECDED code is not sufficient to satisfy DRAM reliability demands as memory systems get more vulnerable. Though the servers in data centers employ strong ECC schemes, such ECC schemes lead to substantial performance and/or storage overhead. In this paper, we propose Stealth ECC, a cost-effective memory protection scheme providing stronger error correctability than the conventional SECDED code, with negligible performance overhead and without storage overhead. Depending on the data-width (either narrow-width or full-width), Stealth ECC adaptively selects ECC schemes. For narrow-width values, Stealth ECC provides multi-bit error correctability by storing more parity bits in MSB side, instead of zeros. Furthermore, with bitwise interleaved data placement between x4 DRAM chips, Stealth ECC is robust to a single DRAM chip error for narrow-width values. On the other hand, for full-width values, Stealth ECC adopts the SECDED code, which maintains DRAM reliability comparable to the conventional SECDED code. As a result, thanks to the reliability improvement of narrow-width values, Stealth ECC enhances overall DRAM reliability, while incurring negligible performance overhead as well as no storage overhead. Our simulation results show that Stealth ECC reduces the probability of system failure (caused by DRAM errors) by 47.9%, on average, with only 0.9% performance overhead compared to the conventional SECDED code.
KW - DRAM reliability
KW - chip error resilience
KW - error correction code
KW - narrow-width value
UR - http://www.scopus.com/inward/record.url?scp=85130810921&partnerID=8YFLogxK
U2 - 10.23919/DATE54114.2022.9774775
DO - 10.23919/DATE54114.2022.9774775
M3 - Conference contribution
AN - SCOPUS:85130810921
T3 - Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
SP - 382
EP - 387
BT - Proceedings of the 2022 Design, Automation and Test in Europe Conference and Exhibition, DATE 2022
A2 - Bolchini, Cristiana
A2 - Verbauwhede, Ingrid
A2 - Vatajelu, Ioana
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 March 2022 through 23 March 2022
ER -