Abstract
Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.
Original language | English |
---|---|
Title of host publication | 32nd Annual ACM Symposium on Applied Computing, SAC 2017 |
Publisher | Association for Computing Machinery |
Pages | 309-314 |
Number of pages | 6 |
ISBN (Electronic) | 9781450344869 |
DOIs | |
Publication status | Published - 2017 Apr 3 |
Event | 32nd Annual ACM Symposium on Applied Computing, SAC 2017 - Marrakesh, Morocco Duration: 2017 Apr 4 → 2017 Apr 6 |
Publication series
Name | Proceedings of the ACM Symposium on Applied Computing |
---|---|
Volume | Part F128005 |
Other
Other | 32nd Annual ACM Symposium on Applied Computing, SAC 2017 |
---|---|
Country/Territory | Morocco |
City | Marrakesh |
Period | 17/4/4 → 17/4/6 |
Bibliographical note
Publisher Copyright:© 2017 ACM.
Keywords
- Open directory project
- Text classification
- Wikipedia
ASJC Scopus subject areas
- Software