Abstract
The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.
| Original language | English |
|---|---|
| Title of host publication | 2016 International Conference on Big Data and Smart Computing, BigComp 2016 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 485-488 |
| Number of pages | 4 |
| ISBN (Electronic) | 9781467387965 |
| DOIs | |
| Publication status | Published - 2016 Mar 3 |
| Event | International Conference on Big Data and Smart Computing, BigComp 2016 - Hong Kong, China Duration: 2016 Jan 18 → 2016 Jan 20 |
Publication series
| Name | 2016 International Conference on Big Data and Smart Computing, BigComp 2016 |
|---|
Other
| Other | International Conference on Big Data and Smart Computing, BigComp 2016 |
|---|---|
| Country/Territory | China |
| City | Hong Kong |
| Period | 16/1/18 → 16/1/20 |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
Keywords
- open directory project
- syntactic structure
- text classification
- text mining
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Information Systems and Management
Fingerprint
Dive into the research topics of 'Bringing bag-of-phrases to ODP-based text classification'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS