Combining Dual Word Embeddings with Open Directory Project Based Text Classification

Dinara Aliyeva, Kang Min Kim, Byung Ju Choi, Sang-Geun Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    Traditional Open Directory Project (ODP)-based text classification methods effectively capture topics of texts by utilizing the hierarchical structure of explicitly human-built knowledge base. However, they only consider term weighting approaches, ignoring the important semantic similarity between words. In this paper, we consider the semantics of words by incorporating the implicit text representation, such as word2vec word embeddings, into the ODP-based text classification. In contrast to common usage of word2vec, we utilize the input and output vectors. This allows us to calculate a combined typical and topical similarity between words of category and document, which is more effective at text classification. To this end, we first incorporate the dual word embeddings of word2vec into the ODP-based text classification to obtain semantically richer category and document representations. Subsequently, we use the combination of the input and output vectors to improve the semantic similarity between category and document. Our evaluation results using a real-world dataset show the efficacy of our proposed approach, exhibiting a significant improvement of 9% and 37% in terms of Fl-score and precision at k, over the state-of-the-art techniques.

    Original languageEnglish
    Title of host publicationProceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018
    EditorsNewton Howard, Sam Kwong, Yingxu Wang, Jerome Feldman, Bernard Widrow, Phillip Sheu
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages179-186
    Number of pages8
    ISBN (Electronic)9781538633601
    DOIs
    Publication statusPublished - 2018 Oct 4
    Event17th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018 - Berkeley, United States
    Duration: 2018 Jul 162018 Jul 18

    Publication series

    NameProceedings of 2018 IEEE 17th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018

    Other

    Other17th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2018
    Country/TerritoryUnited States
    CityBerkeley
    Period18/7/1618/7/18

    Bibliographical note

    Publisher Copyright:
    © 2018 IEEE.

    Keywords

    • Machine Learning
    • Text Classification
    • Word embeddings

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Information Systems
    • Cognitive Neuroscience

    Fingerprint

    Dive into the research topics of 'Combining Dual Word Embeddings with Open Directory Project Based Text Classification'. Together they form a unique fingerprint.

    Cite this