Low-cost fault-tolerance protocol for large-scale network monitoring

Jin Ho Ahn, Sung Gi Min, Young Il Choi, Byung Sun Lee

    Research output: Chapter in Book/Report/Conference proceedingChapter

    Abstract

    Distributed hierarchical network monitoring model has been proposed to solve scalability problem of centralized model. In this distributed model, a top-level monitoring manager, called main manager, obtains aggregate management information from mid-level managers, named domain managers, forming a hierarchical structure. However, if some of monitoring managers crash, network elements cannot be continuously and correctly monitored until the managers are repaired. To address this important, but previously unresolved issue, this paper presents a new fault-tolerance protocol for domain managers, named DMFTP, allowing the managers to efficiently utilize their organization structure. Therefore, this protocol can minimize failure detection overhead and the number of live managers affected by each manager node crash. Also, it tolerates concurrent manager failures and, after the failed managers have been repaired, ensures their immediate and consistent recovery.

    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    EditorsPeter M.A. Sloot, David Abramson, Alexander V. Bogdanov, Yuriy E. Gorbachev, Jack J. Dongarra, Albert Y. Zomaya
    PublisherSpringer Verlag
    Pages504-513
    Number of pages10
    ISBN (Print)9783540401964
    DOIs
    Publication statusPublished - 2003

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume2659
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Low-cost fault-tolerance protocol for large-scale network monitoring'. Together they form a unique fingerprint.

    Cite this