From Text Classification to Keyphrase Extraction for Short Text

Song Eun Lee, Kang Min Kim, Woo Jong Ryu, Jemin Park, Sangkeun Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    2 Citations (Scopus)

    Abstract

    Existing keyphrase extraction approaches often suffer from issues such as the sparsity and brevity of short text (e.g., headlines, queries, and tweets). In this paper, we propose a novel keyphrase extraction method for short text by utilizing recurrent neural networks. The main idea behind our approach is to classify short text into a relevant class or category and extract keyphrases from important words in the class or category. Unlike previous supervised approaches that need the information of annotated keyphrases, our approach requires only a text classification dataset (i.e., DBpedia), which is easier to use and requires less human effort. In our approach, we first feed short text into the attention-based neural network for text classification. We then compute attention weights of each word in input short text. Subsequently, we detect keyphrase candidates by chunking phrases and summing the attention weights of compositional words in the chunked phrase. The experimental results clearly show the efficacy of our approach on real-world datasets, such as headlines, queries, and tweets. The proposed method outperforms the Microsoft Cognitive Services and IBM Watson Natural Language Understanding service for keyphrase extraction in terms of F1-score and acceptable percentage on the NYT and Question datasets. Further, we confirm that the proposed method is comparable to supervised methods for keyphrase extraction from short text in the Tweet dataset.

    Original languageEnglish
    Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
    EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1137-1142
    Number of pages6
    ISBN (Electronic)9781728108582
    DOIs
    Publication statusPublished - 2019 Dec
    Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
    Duration: 2019 Dec 92019 Dec 12

    Publication series

    NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

    Conference

    Conference2019 IEEE International Conference on Big Data, Big Data 2019
    Country/TerritoryUnited States
    CityLos Angeles
    Period19/12/919/12/12

    Bibliographical note

    Publisher Copyright:
    © 2019 IEEE.

    Copyright:
    Copyright 2020 Elsevier B.V., All rights reserved.

    Keywords

    • Attention mechanism
    • Deep neural network
    • Keyphrase extraction
    • Knowledge base
    • Text classification

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Networks and Communications
    • Information Systems
    • Information Systems and Management

    Fingerprint

    Dive into the research topics of 'From Text Classification to Keyphrase Extraction for Short Text'. Together they form a unique fingerprint.

    Cite this