Abstract
The presence of procedures and procedure calls introduces side effects, which complicates the analysis of stale reference detection in compiler-directed cache coherence schemes. Previous compiler algorithms use cache invalidation at procedure boundary or inlining to avoid reference marking interprocedurally. We introduce a full interprocedural algorithm, which performs bottom-up and top-down analysis on the procedure call graph. This avoids unnecessary cache misses for subroutine local data and exploits locality across procedure boundaries. The result of execution-driven simulations on Perfect benchmarks demonstrates that, the interprocedural algorithm eliminates up to 36.8% of the cache misses for a compiler-directed scheme compared to an existing invalidation-based algorithm.
Original language | English |
---|---|
Title of host publication | Software |
Editors | K. Pingali |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 103-113 |
Number of pages | 11 |
ISBN (Electronic) | 081867623X |
DOIs | |
Publication status | Published - 1996 |
Externally published | Yes |
Event | 25th International Conference on Parallel Processing, ICPP 1996 - Ithaca, United States Duration: 1996 Aug 12 → 1996 Aug 16 |
Publication series
Name | Proceedings of the International Conference on Parallel Processing |
---|---|
Volume | 3 |
ISSN (Print) | 0190-3918 |
Other
Other | 25th International Conference on Parallel Processing, ICPP 1996 |
---|---|
Country/Territory | United States |
City | Ithaca |
Period | 96/8/12 → 96/8/16 |
Bibliographical note
Funding Information:We have implemented these algorithms on the Polaris parallelizing compiler [l l], and demonstrated the performance driven by the new compiler algorithms by running execution-driven simulations of five Perfect benchmarks. The results show that by avoiding cache invalidations, the intraprocedural algorithm eliminates up to 26.0% of the cache misses for a compiler-directed scheme compared to an existing invalidation-based algorithm [7]. With the full inter-procedural analysis, up to 10.8% of additional cache misses can be removed. Acknowledgments The research described in this paper was supported in part by the NSF Grant No. MIP 89-20891, MIP 93-07910 and ARPA contract #DABT63-95-C-0097. This work is not necessarily representative of the positions or policies of the Army of the Government. This work was performed while the first author was at the University of Illinois. We thank Hock-Beng Lim at the University of Illinois for his valuable comments.
Publisher Copyright:
© 1996 IEEE.
ASJC Scopus subject areas
- Software
- General Mathematics
- Hardware and Architecture