Management of fault tolerance information for coordinated checkpointing protocol without sympathetic rollbacks

Kwang Sik Chung, Young Jun Lee, Heon Chang Yu, Won Gyu Lee

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


This paper presents the condition for an extended global recovery line for coordinated checkpointing protocol and a new garbage collection protocol on checkpoints and message logs in order to avoid the sympathetic rollback caused by lost messages. Since previous works assumed the communication channel does not lose the in-transit messages, those works on garbage collection in coordinated checkpointing protocols delete all the checkpoints except for the last checkpoints on each process. But coordinated checkpointing protocol based on the communication protocol with reliability (TCP) causes in-transit messages to be lost when a failure occurs, and lost messages lead to sympathetic rollbacks of faulty processes or related processes. Thus there is a need for management methods of fault tolerance information that can store and delete the coordinated checkpoint and light message log to avoid sympathetic rollback. In this paper, we define the extended global recovery line conditions for garbage collection of checkpoints and message logs for lost messages, and present the new garbage collection algorithm within the extended global recovery line. The proposed algorithm uses piggybacked process information on each message so that the additional messages for garbage collection and extended global recovery line are not needed. Since it relies on the piggybacked checkpoint information in communication message, the proposed garbage collection algorithm is called 'the lazy garbage collection algorithm'.

Original languageEnglish
Pages (from-to)379-390
Number of pages12
JournalJournal of Information Science and Engineering
Issue number2
Publication statusPublished - 2004 Mar


  • Coordinated checkpointing protocol
  • Garbage collection
  • Message log
  • Sympathetic rollback

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Hardware and Architecture
  • Library and Information Sciences
  • Computational Theory and Mathematics


Dive into the research topics of 'Management of fault tolerance information for coordinated checkpointing protocol without sympathetic rollbacks'. Together they form a unique fingerprint.

Cite this