Technology analysis from patent data using latent dirichlet allocation

Gabjo Kim, Sangsung Park, Dongsik Jang

    Research output: Contribution to journalArticlepeer-review

    19 Citations (Scopus)

    Abstract

    This paper discusses how to apply latent Dirichlet allocation, a topic model, in a trend analysis methodology that exploits patent information. To accomplish this, text mining is used to convert unstructured patent documents into structured data. Next, the term frequency-inverse document frequency (tf-idf) value is used in the feature selection process. After the text preprocessing, the number of topics is decided using the perplexity value. In this study, we employed U.S. patent data on technology that reduces greenhouse gases. We extracted words from 50 relevant topics and showed that these topics are highly meaningful in explaining trends per period.

    Original languageEnglish
    Pages (from-to)71-80
    Number of pages10
    JournalAdvances in Intelligent Systems and Computing
    Volume271
    DOIs
    Publication statusPublished - 2014

    Bibliographical note

    Publisher Copyright:
    © Springer International Publishing Switzerland 2014.

    Keywords

    • Latent Dirchlet allocation
    • Text mining
    • Tf-idf
    • Topic model

    ASJC Scopus subject areas

    • Control and Systems Engineering
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Technology analysis from patent data using latent dirichlet allocation'. Together they form a unique fingerprint.

    Cite this