Data deduplication using dynamic chunking algorithm

Young Chan Moon, Ho Min Jung, Chuck Yoo, Young Woong Ko

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    6 Citations (Scopus)

    Abstract

    Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.

    Original languageEnglish
    Title of host publicationComputational Collective Intelligence
    Subtitle of host publicationTechnologies and Applications - 4th International Conference, ICCCI 2012, Proceedings
    Pages59-68
    Number of pages10
    EditionPART 2
    DOIs
    Publication statusPublished - 2012
    Event4th International Conference on Computational Collective Intelligence, ICCCI 2012 - Ho Chi Minh City, Viet Nam
    Duration: 2012 Nov 282012 Nov 30

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    NumberPART 2
    Volume7654 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other4th International Conference on Computational Collective Intelligence, ICCCI 2012
    Country/TerritoryViet Nam
    CityHo Chi Minh City
    Period12/11/2812/11/30

    Keywords

    • Chunking algorithm
    • Deduplication
    • File similarity
    • Metadata

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Data deduplication using dynamic chunking algorithm'. Together they form a unique fingerprint.

    Cite this