Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline

Jaegeun Oh, Seok Joong Hwang, Huong Giang Nguyen, Areum Kim, Seon Wook Kim, Chulwoo Kim, Jong Kook Kim

    Research output: Contribution to journalArticlepeer-review

    5 Citations (Scopus)

    Abstract

    In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SEMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SEMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 135% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

    Original languageEnglish
    Pages (from-to)576-586
    Number of pages11
    JournalETRI Journal
    Volume30
    Issue number4
    DOIs
    Publication statusPublished - 2008 Aug

    Keywords

    • CMP
    • ILP
    • MLER
    • SMT
    • TLP

    ASJC Scopus subject areas

    • Electronic, Optical and Magnetic Materials
    • General Computer Science
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline'. Together they form a unique fingerprint.

    Cite this