Rich character-level information for Korean morphological analysis and part-of-speech tagging

Andrew Matteson, Chanhee Lee, Heuiseok Lim, Young Bum Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior linguistic knowledge (i.e., a dictionary of known morphological transformation forms, or actions). These models have been created with the assumption that character-level, dictionary-less morphological analysis was intractable due to the number of actions required. We present, in this study, a multi-stage action-based model that can perform morphological transformation and part-of-speech tagging using arbitrary units of input and apply it to the case of character-level Korean morphological analysis. Among models that do not employ prior linguistic knowledge, we achieve state-of-the-art word and sentence-level tagging accuracy with the Sejong Korean corpus using our proposed data-driven Bi-LSTM model.

Original languageEnglish
Title of host publicationCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
EditorsEmily M. Bender, Leon Derczynski, Pierre Isabelle
PublisherAssociation for Computational Linguistics (ACL)
Pages2482-2492
Number of pages11
ISBN (Electronic)9781948087506
Publication statusPublished - 2018
Event27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States
Duration: 2018 Aug 202018 Aug 26

Publication series

NameCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings

Conference

Conference27th International Conference on Computational Linguistics, COLING 2018
Country/TerritoryUnited States
CitySanta Fe
Period18/8/2018/8/26

Bibliographical note

Funding Information:
This research was supported by the MSIT (Ministry of Science and ICT), South Korea, under the ITRC (Information Technology Research Center) support program (”Research and Development of Human-Inspired Multiple Intelligence”) supervised by the IITP (Institute for Information & Communications Technology Promotion). Additionally, this work was supported by the National Research Foundation of Korea (NRF) grant funded by the South Korean government (MSIP) (No. NRF-2016R1A2B2015912).

Publisher Copyright:
© 2018 COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings. All rights reserved.

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Rich character-level information for Korean morphological analysis and part-of-speech tagging'. Together they form a unique fingerprint.

Cite this