Abstract
In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.
Original language | English |
---|---|
Pages (from-to) | 499-505 |
Number of pages | 7 |
Journal | IEEE Transactions on Computers |
Volume | 46 |
Issue number | 4 |
DOIs | |
Publication status | Published - 1997 |
Externally published | Yes |
Bibliographical note
Funding Information:This research was supported in part by KOSEF under Grant 941-0900-055-2 and ETRI under Contract 94231. A preliminary version of this paper was presented at the 25th FTCS.
Keywords
- Backup process
- Checkpointing
- Fault-tolerant multicomputer
- Load balancing
- Process allocation
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics