Abstract
HTML is a widely used markup language to make up innumerable web pages. Parallelization of a HTML parser would lead to consequential performance improvement and a better user experience. However, parallelizing the HTML parser is challenging because of a strong cyclic dependence in the parser model. In this paper, we propose a semantic-based HTML parallel parser design that splits the input HTML document by a 'div' tag, and processes the independent partial inputs with multiple parser threads. We evaluated the proposed HTML parallel parser with the benchmarks selected from top 500 web pages and achieved a maximum speedup of 1.49x.
| Original language | English |
|---|---|
| Title of host publication | International Conference on Electronics, Information, and Communications, ICEIC 2016 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781467380164 |
| DOIs | |
| Publication status | Published - 2016 Sept 7 |
| Event | 15th International Conference on Electronics, Information, and Communications, ICEIC 2016 - Danang, Viet Nam Duration: 2016 Jan 27 → 2016 Jan 30 |
Other
| Other | 15th International Conference on Electronics, Information, and Communications, ICEIC 2016 |
|---|---|
| Country/Territory | Viet Nam |
| City | Danang |
| Period | 16/1/27 → 16/1/30 |
Keywords
- HTML
- multithread
- parallelizing
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Control and Systems Engineering