Toward robust classification using the Open Directory Project

Jongwoo Ha, Jung Hyun Lee, Won Jun Jang, Yong Ku Lee, Sang-Geun Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    9 Citations (Scopus)

    Abstract

    The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory utilized in many studies and real-world applications. In this paper, we explore training data expansion techniques for text classification as one of the possible directions to deal with the sparse characteristic of the ODP dataset. We propose a dozen classification methods, which can be differentiated by (1) from which categories training data is expanded, and (2) how the expanded training data is merged to generate centroid vectors. Evaluation results show that training data expansion significantly improves the classification performance more than representative classifiers. We also find that (1) child and descendant categories are more valuable sources to expand training data than parent and ancestor categories, and (2) distance-based weighting is superior to simple averaging to merge the expanded training data.

    Original languageEnglish
    Title of host publicationDSAA 2014 - Proceedings of the 2014 IEEE International Conference on Data Science and Advanced Analytics
    EditorsGeorge Karypis, Longbing Cao, Wei Wang, Irwin King
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages607-612
    Number of pages6
    ISBN (Electronic)9781479969913
    DOIs
    Publication statusPublished - 2014 Mar 10
    Event2014 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2014 - Shanghai, China
    Duration: 2014 Oct 302014 Nov 1

    Publication series

    NameDSAA 2014 - Proceedings of the 2014 IEEE International Conference on Data Science and Advanced Analytics

    Other

    Other2014 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2014
    Country/TerritoryChina
    CityShanghai
    Period14/10/3014/11/1

    Bibliographical note

    Publisher Copyright:
    © 2014 IEEE.

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Information Systems
    • Information Systems and Management

    Fingerprint

    Dive into the research topics of 'Toward robust classification using the Open Directory Project'. Together they form a unique fingerprint.

    Cite this