Design of HTML parallel parser with semantic-based input splitting

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    HTML is a widely used markup language to make up innumerable web pages. Parallelization of a HTML parser would lead to consequential performance improvement and a better user experience. However, parallelizing the HTML parser is challenging because of a strong cyclic dependence in the parser model. In this paper, we propose a semantic-based HTML parallel parser design that splits the input HTML document by a 'div' tag, and processes the independent partial inputs with multiple parser threads. We evaluated the proposed HTML parallel parser with the benchmarks selected from top 500 web pages and achieved a maximum speedup of 1.49x.

    Original languageEnglish
    Title of host publicationInternational Conference on Electronics, Information, and Communications, ICEIC 2016
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    ISBN (Electronic)9781467380164
    DOIs
    Publication statusPublished - 2016 Sept 7
    Event15th International Conference on Electronics, Information, and Communications, ICEIC 2016 - Danang, Viet Nam
    Duration: 2016 Jan 272016 Jan 30

    Other

    Other15th International Conference on Electronics, Information, and Communications, ICEIC 2016
    Country/TerritoryViet Nam
    CityDanang
    Period16/1/2716/1/30

    Keywords

    • HTML
    • multithread
    • parallelizing

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Control and Systems Engineering

    Fingerprint

    Dive into the research topics of 'Design of HTML parallel parser with semantic-based input splitting'. Together they form a unique fingerprint.

    Cite this