TY - GEN
T1 - Utilizing Wikipedia knowledge in open directory project-based text classification
AU - Shin, Hae Yong
AU - Lee, Geun Jae
AU - Ryu, Woo Jong
AU - Lee, Sang-Geun
N1 - Funding Information:
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (number 2015R1A2A1A10052665).
Publisher Copyright:
© 2017 ACM.
PY - 2017/4/3
Y1 - 2017/4/3
N2 - Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.
AB - Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.
KW - Open directory project
KW - Text classification
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=85020899791&partnerID=8YFLogxK
U2 - 10.1145/3019612.3019
DO - 10.1145/3019612.3019
M3 - Conference contribution
AN - SCOPUS:85020899791
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 309
EP - 314
BT - 32nd Annual ACM Symposium on Applied Computing, SAC 2017
PB - Association for Computing Machinery
T2 - 32nd Annual ACM Symposium on Applied Computing, SAC 2017
Y2 - 4 April 2017 through 6 April 2017
ER -