TY - JOUR
T1 - Hardware and compiler-directed cache coherence in large-scale multiprocessors
T2 - design considerations and performance study
AU - Choi, Lynn
AU - Yew, Pen Chung
N1 - Funding Information:
The research described in this paper was supported in part by the NSF Grant No. MIP 89-20891 and MIP 93-07910. The authors wish to thank several friends and colleagues of the University of Illinois who have participated in the review of an early version of this paper, particularly Vijay Karamcheti for his thoughtful comments, and Tae Hyung Kim of the University of Maryland for his early comments and encouragement. The authors express thanks to Prof. Sang Lyul Min of Seoul National University and Prof. David Lilja of the University of Minnesota for proofreading the paper and their valuable comments. Special thanks go to David Poulsen of Kuck and Associates, Inc., for his numerous help with the development of execution-driven simulations, and to Lawrence Rauchwerger and IBM for providing RS6000 clusters for simulations.
PY - 2000
Y1 - 2000
N2 - In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessor, such as the Cray T3D. The scheme can be adapted to various cache organizations, including multiword cache lines and byte-addressable architectures. Several system related issues, including critical sections, interthread communication, and task migration have also been addressed. The cost of the required hardware support is minimal and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data flow analysis, have been implemented on the Polaris parallelizing compiler [34]. From our simulation study using the Perfect Club benchmarks [5], we found that in spite of the conservative analysis made by the compiler, for four of six benchmark programs tested, the proposed HSCD scheme outperforms the full-map hardware directory scheme up to 70 percent while the hardware scheme outperforms the HSCD scheme in the remaining two applications up to 89 percent. Given its comparable performance and reduced hardware cost, the proposed scheme can be a viable alternative for large-scale multiprocessors such as the Cray T3D, which rely on users to maintain data coherence.
AB - In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessor, such as the Cray T3D. The scheme can be adapted to various cache organizations, including multiword cache lines and byte-addressable architectures. Several system related issues, including critical sections, interthread communication, and task migration have also been addressed. The cost of the required hardware support is minimal and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data flow analysis, have been implemented on the Polaris parallelizing compiler [34]. From our simulation study using the Perfect Club benchmarks [5], we found that in spite of the conservative analysis made by the compiler, for four of six benchmark programs tested, the proposed HSCD scheme outperforms the full-map hardware directory scheme up to 70 percent while the hardware scheme outperforms the HSCD scheme in the remaining two applications up to 89 percent. Given its comparable performance and reduced hardware cost, the proposed scheme can be a viable alternative for large-scale multiprocessors such as the Cray T3D, which rely on users to maintain data coherence.
UR - http://www.scopus.com/inward/record.url?scp=0033707855&partnerID=8YFLogxK
U2 - 10.1109/71.850834
DO - 10.1109/71.850834
M3 - Article
AN - SCOPUS:0033707855
SN - 1045-9219
VL - 11
SP - 375
EP - 394
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 4
ER -