Probabilistic Modeling of Korean Morphology

Do Gil Lee, Hae Chang Rim

    Research output: Contribution to journalArticlepeer-review

    20 Citations (Scopus)

    Abstract

    This paper proposes new probabilistic models for analyzing Korean morphology. In order to take advantage of the characteristics of Korean morphology, the proposed models are based on three linguistic units: eojeol (a Korean spacing unit), morpheme, and syllable. Unlike previous approaches that are based on rules and dictionaries, the probabilistic approach proposed in this study can automatically acquire complete linguistic knowledge from part-of-speech (POS) tagged corpora. In addition, this approach, without any system modification, is easily applicable to other corpora with different tagsets and annotation guidelines. The three different models and their combinations are evaluated on three corpora over a wide range of conditions. The eo-jeol-unit and syllable-unit models compensate for the weaknesses of the morpheme-unit model. The eojeol-unit model performed efficiently, and improved the precision. The syllable-unit model improved in precision as well, showing a particularly robust performance in treating unknown words. The proposed approach is also proven to outperform the previous approaches.

    Original languageEnglish
    Pages (from-to)945-955
    Number of pages11
    JournalIEEE Transactions on Audio, Speech and Language Processing
    Volume17
    Issue number5
    DOIs
    Publication statusPublished - 2009 Jul

    Bibliographical note

    Funding Information:
    Manuscript received August 01, 2008; revised March 08, 2009. Current version published June 17, 2009. This work was supported in part by a Korea University Grant and in part by the Second Brain Korea 21 Project and WCU (World Class University) program through the Korea Science and Engineering Foundation funded by the Ministry of Education, Science, and Technology (R31-2008-000-10008-0). D.-G. Lee’s Ph.D. dissertation is the basis of the work presented in this paper. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Ruhi Sarikaya.

    Keywords

    • Korean morphology
    • machine learning
    • morphologial analysis
    • probabilistic model

    ASJC Scopus subject areas

    • Acoustics and Ultrasonics
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Probabilistic Modeling of Korean Morphology'. Together they form a unique fingerprint.

    Cite this