Abstract
The aim of a spelling correction task is to detect spelling errors and automatically correct them. In this paper we aim to perform the Korean spelling correction task from a machine translation perspective, allowing it to overcome the limitations of cost, time and data. Based on a sequence to sequence model, the model aligns its source sentence with an ‘error filled sentence’ and its target sentence aligned to the correct counter part. Thus, ‘translating’ the error sentence to a correct sentence. For this research, we have also proposed three new data generation methods allowing the creation of multiple spelling correction parallel corpora from just a single monolingual corpus. Additionally, we discovered that applying the Copy Mechanism not only resolves the problem of overcorrection but even improves it. For this paper, we evaluated our model upon these aspects: Performance comparisons to other models and evaluation on overcorrection. The results show the proposed model to even out-perform other systems currently in commercial use.
Original language | English |
---|---|
Pages (from-to) | 34591-34608 |
Number of pages | 18 |
Journal | Multimedia Tools and Applications |
Volume | 80 |
Issue number | 26-27 |
DOIs | |
Publication status | Published - 2021 Nov |
Bibliographical note
Funding Information:This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2020-2018-0-01405) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation) and National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No.NRF-2017M3C4A7068189). I am very grateful to my friend Yejin Jang for helping me with correcting English.
Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.
Keywords
- Automatic noise generation
- Copy mechanism
- Korean spelling correction
- Neural machine translation
- Overcorrection
- Transformer
ASJC Scopus subject areas
- Software
- Media Technology
- Hardware and Architecture
- Computer Networks and Communications