Abstract
This paper proposes new probabilistic models for analyzing Korean morphology. In order to take advantage of the characteristics of Korean morphology, the proposed models are based on three linguistic units: eojeol (a Korean spacing unit), morpheme, and syllable. Unlike previous approaches that are based on rules and dictionaries, the probabilistic approach proposed in this study can automatically acquire complete linguistic knowledge from part-of-speech (POS) tagged corpora. In addition, this approach, without any system modification, is easily applicable to other corpora with different tagsets and annotation guidelines. The three different models and their combinations are evaluated on three corpora over a wide range of conditions. The eo-jeol-unit and syllable-unit models compensate for the weaknesses of the morpheme-unit model. The eojeol-unit model performed efficiently, and improved the precision. The syllable-unit model improved in precision as well, showing a particularly robust performance in treating unknown words. The proposed approach is also proven to outperform the previous approaches.
Original language | English |
---|---|
Pages (from-to) | 945-955 |
Number of pages | 11 |
Journal | IEEE Transactions on Audio, Speech and Language Processing |
Volume | 17 |
Issue number | 5 |
DOIs | |
Publication status | Published - 2009 Jul |
Keywords
- Korean morphology
- machine learning
- morphologial analysis
- probabilistic model
ASJC Scopus subject areas
- Acoustics and Ultrasonics
- Electrical and Electronic Engineering