TY - JOUR
T1 - Reducing the delay for decoding instructions by predicting their source register operands
AU - Park, Sanghyun
AU - Jun, Jaeyung
AU - Kim, Changhyun
AU - Min, Gyeongil
AU - Lee, Hun Jae
AU - Kim, Seonwook
N1 - Funding Information:
Acknowledgments: This work was supported by the IT R&D program of MOTIE/KEIT [10052716, Design technology development of ultra-low voltage operating circuit and IP for smart sensor SoC] and the Competency Development Program for Industry Specialists of the Korean Ministry of Trade, Industry and Energy (MOTIE), operated by Korea Institute for Advancement of Technology (KIAT) (No. N0001883, HRD Program for Intelligent Semiconductor Industry). The chip fabrication and EDA tool for this work were supported by the IC Design Education Center (IDEC), Korea.
Funding Information:
This work was supported by the IT R&D program of MOTIE/KEIT [10052716, Design technology development of ultra-low voltage operating circuit and IP for smart sensor SoC] and the Competency Development Program for Industry Specialists of the Korean Ministry of Trade, Industry and Energy (MOTIE), operated by Korea Institute for Advancement of Technology (KIAT) (No. N0001883, HRD Program for Intelligent Semiconductor Industry). The chip fabrication and EDA tool for this work were supported by the IC Design Education Center (IDEC), Korea.
Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2020/5
Y1 - 2020/5
N2 - The fetched instructions would have data dependency with in-flight ones in the pipeline execution of a processor, so the dependency prevents the processor from executing the incoming instructions for guaranteeing the program’s correctness. The register and memory dependencies are detected in the decode and memory stages, respectively. In a small embedded processor that supports as many ISAsas possible to reduce code size, the instruction decoding to identify register usage with the dependence check generally results in long delay and sometimes a critical path in its implementation. For reducing the delay, this paper proposes two methods—One method assumes the widely used source register operand bit-fields without fully decoding the instructions. However, this assumption would cause additional stalls due to the incorrect prediction; thus, it would degrade the performance. To solve this problem, as the other method, we adopt a table-based way to store the dependence history and later use this information for more precisely predicting the dependency. We applied our methods to the commercial EISC embedded processor with the Samsung 65nm process; thus, we reduced the critical path delay and increased its maximum operating frequency by 12.5% and achieved an average 11.4% speed-up in the execution time of the EEMBC applications. We also improved the static, dynamic power consumption, and EDP by 7.2%, 8.5%, and 13.6%, respectively, despite the implementation area overhead of 2.5%.
AB - The fetched instructions would have data dependency with in-flight ones in the pipeline execution of a processor, so the dependency prevents the processor from executing the incoming instructions for guaranteeing the program’s correctness. The register and memory dependencies are detected in the decode and memory stages, respectively. In a small embedded processor that supports as many ISAsas possible to reduce code size, the instruction decoding to identify register usage with the dependence check generally results in long delay and sometimes a critical path in its implementation. For reducing the delay, this paper proposes two methods—One method assumes the widely used source register operand bit-fields without fully decoding the instructions. However, this assumption would cause additional stalls due to the incorrect prediction; thus, it would degrade the performance. To solve this problem, as the other method, we adopt a table-based way to store the dependence history and later use this information for more precisely predicting the dependency. We applied our methods to the commercial EISC embedded processor with the Samsung 65nm process; thus, we reduced the critical path delay and increased its maximum operating frequency by 12.5% and achieved an average 11.4% speed-up in the execution time of the EEMBC applications. We also improved the static, dynamic power consumption, and EDP by 7.2%, 8.5%, and 13.6%, respectively, despite the implementation area overhead of 2.5%.
KW - Data dependency
KW - Decoding
KW - ISAs
KW - Register operand detection
UR - http://www.scopus.com/inward/record.url?scp=85085128055&partnerID=8YFLogxK
U2 - 10.3390/electronics9050820
DO - 10.3390/electronics9050820
M3 - Article
AN - SCOPUS:85085128055
SN - 2079-9292
VL - 9
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 5
M1 - 820
ER -