Replicated process allocation for load distribution in fault-tolerant multicomputers

Jong Kim, Heejo Lee, Sunggu Lee

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)

Abstract

In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.

Original languageEnglish
Pages (from-to)499-505
Number of pages7
JournalIEEE Transactions on Computers
Volume46
Issue number4
DOIs
Publication statusPublished - 1997
Externally publishedYes

Bibliographical note

Funding Information:
This research was supported in part by KOSEF under Grant 941-0900-055-2 and ETRI under Contract 94231. A preliminary version of this paper was presented at the 25th FTCS.

Keywords

  • Backup process
  • Checkpointing
  • Fault-tolerant multicomputer
  • Load balancing
  • Process allocation

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Replicated process allocation for load distribution in fault-tolerant multicomputers'. Together they form a unique fingerprint.

Cite this