TY - GEN
T1 - OpenMP directive extension for BlackFin 561 dual core processor
AU - Seo, Hee
AU - Kim, Seon Wook
PY - 2006
Y1 - 2006
N2 - Many researchers and vendors are exploiting the increasing number of transistors to build chip multiprocessors (CMPs) by partitioning a chip into multiple simple ILP cores. As in traditional multiprocessors, CMPs extract thread-level parallelism (TLP) from programs by running multiple independent program segments, i.e., threads, in parallel. Currently CMPs are used widely in high performance servers, and even in embedded systems. In this paper, we present an extension of the OpenMP shared directive for performance optimization on BlackFin 561 (ADSP-BF561) dual core processors. In order to support memory consistency between multiple cores, many architectures have been proposed. On the dual core processor, like ADSP-BF561, each core has its own private LI cache, and a shared L2 cache. In order to execute multithreaded parallel programs, we need to consider carefully where to allocate shared variables on targeted memory architecture. We could improve the speedup by up to 107% and reduce the energy consumption by up to 108% in our measured benchmarks with respect to no use of our extension.
AB - Many researchers and vendors are exploiting the increasing number of transistors to build chip multiprocessors (CMPs) by partitioning a chip into multiple simple ILP cores. As in traditional multiprocessors, CMPs extract thread-level parallelism (TLP) from programs by running multiple independent program segments, i.e., threads, in parallel. Currently CMPs are used widely in high performance servers, and even in embedded systems. In this paper, we present an extension of the OpenMP shared directive for performance optimization on BlackFin 561 (ADSP-BF561) dual core processors. In order to support memory consistency between multiple cores, many architectures have been proposed. On the dual core processor, like ADSP-BF561, each core has its own private LI cache, and a shared L2 cache. In order to execute multithreaded parallel programs, we need to consider carefully where to allocate shared variables on targeted memory architecture. We could improve the speedup by up to 107% and reduce the energy consumption by up to 108% in our measured benchmarks with respect to no use of our extension.
UR - http://www.scopus.com/inward/record.url?scp=34547265718&partnerID=8YFLogxK
U2 - 10.1109/CIT.2006.131
DO - 10.1109/CIT.2006.131
M3 - Conference contribution
AN - SCOPUS:34547265718
SN - 076952687X
SN - 9780769526874
T3 - Proceedings - Sixth IEEE International Conference on Computer and Information Technology, CIT 2006
SP - 49
EP - 54
BT - Proceedings - Sixth IEEE International Conference on Computer and Information Technology, CIT 2006
PB - IEEE Computer Society
T2 - 6th IEEE International Conference on Computer and Information Technology, CIT 2006
Y2 - 20 September 2006 through 22 September 2006
ER -