Abstract
The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.
Original language | English |
---|---|
Title of host publication | 2016 International Conference on Big Data and Smart Computing, BigComp 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 485-488 |
Number of pages | 4 |
ISBN (Electronic) | 9781467387965 |
DOIs | |
Publication status | Published - 2016 Mar 3 |
Event | International Conference on Big Data and Smart Computing, BigComp 2016 - Hong Kong, China Duration: 2016 Jan 18 → 2016 Jan 20 |
Publication series
Name | 2016 International Conference on Big Data and Smart Computing, BigComp 2016 |
---|
Other
Other | International Conference on Big Data and Smart Computing, BigComp 2016 |
---|---|
Country/Territory | China |
City | Hong Kong |
Period | 16/1/18 → 16/1/20 |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
Keywords
- open directory project
- syntactic structure
- text classification
- text mining
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Information Systems and Management