Bringing bag-of-phrases to ODP-based text classification

Haeyong Shin, Byung Gul Ryu, Woo Jong Ryu, Geunjae Lee, Sang-Geun Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory. Many studies and real-world applications build on an ODP-based classifier. However, existing approaches use traditional bag-of-words representation of text to develop an ODP-based classifier and words alone do not always provide atomic units of semantic meaning. In this paper, we propose a novel framework to better understand the semantic meaning of text by bringing bag-of-phrases to ODP-based text classification. The proposed method employs a syntactic tree to extract phrases from ODP and applies a phrase selection method to alleviate the high dimensionality problem of bag-of-phrases. The conducted evaluation results demonstrate that our approach outperforms the state-of-the-art methods in classification performance.

    Original languageEnglish
    Title of host publication2016 International Conference on Big Data and Smart Computing, BigComp 2016
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages485-488
    Number of pages4
    ISBN (Electronic)9781467387965
    DOIs
    Publication statusPublished - 2016 Mar 3
    EventInternational Conference on Big Data and Smart Computing, BigComp 2016 - Hong Kong, China
    Duration: 2016 Jan 182016 Jan 20

    Publication series

    Name2016 International Conference on Big Data and Smart Computing, BigComp 2016

    Other

    OtherInternational Conference on Big Data and Smart Computing, BigComp 2016
    Country/TerritoryChina
    CityHong Kong
    Period16/1/1816/1/20

    Bibliographical note

    Publisher Copyright:
    © 2016 IEEE.

    Keywords

    • open directory project
    • syntactic structure
    • text classification
    • text mining

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems
    • Information Systems and Management

    Fingerprint

    Dive into the research topics of 'Bringing bag-of-phrases to ODP-based text classification'. Together they form a unique fingerprint.

    Cite this