Utilizing Wikipedia knowledge in open directory project-based text classification

Hae Yong Shin, Geun Jae Lee, Woo Jong Ryu, Sang-Geun Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    9 Citations (Scopus)

    Abstract

    Traditional Open Directory Project (ODP)-based text classification methods use bag-of-words approach, which only utilizes single words in ODP documents and ignores important types of semantic information such as phrases and related terms. In this paper, we propose a method for enriching the semantic information in ODP documents by utilizing Wikipedia knowledge. First, we construct a phrase dictionary based on Wikipedia and search for Wikipedia phrases in ODP documents. Second, we select the most likely relevant Wikipedia articles and relevant hyperlinks for Wikipedia phrases in ODP documents. Finally, we add Wikipedia phrases and relevant hyperlinks to ODP documents to enrich the semantic information. Our evaluation results verify the efficacy of the proposed methodology.

    Original languageEnglish
    Title of host publication32nd Annual ACM Symposium on Applied Computing, SAC 2017
    PublisherAssociation for Computing Machinery
    Pages309-314
    Number of pages6
    ISBN (Electronic)9781450344869
    DOIs
    Publication statusPublished - 2017 Apr 3
    Event32nd Annual ACM Symposium on Applied Computing, SAC 2017 - Marrakesh, Morocco
    Duration: 2017 Apr 42017 Apr 6

    Publication series

    NameProceedings of the ACM Symposium on Applied Computing
    VolumePart F128005

    Other

    Other32nd Annual ACM Symposium on Applied Computing, SAC 2017
    Country/TerritoryMorocco
    CityMarrakesh
    Period17/4/417/4/6

    Bibliographical note

    Publisher Copyright:
    © 2017 ACM.

    Keywords

    • Open directory project
    • Text classification
    • Wikipedia

    ASJC Scopus subject areas

    • Software

    Fingerprint

    Dive into the research topics of 'Utilizing Wikipedia knowledge in open directory project-based text classification'. Together they form a unique fingerprint.

    Cite this