Abstract
Word embedding is considered an essential factor in improving the performance of various Natural Language Processing (NLP) models. However, it is hardly applicable in real-world datasets as word embedding is generally studied with a well-refined corpus. Notably, in Hangeul (Korean writing system), which has a unique writing system, various kinds of Out-Of-Vocabulary (OOV) appear from typos. In this paper, we propose a robust Hangeul word embedding model against typos, while maintaining high performance. The proposed model utilizes a Convolutional Neural Network (CNN) architecture with a channel attention mechanism that learns to infer the original word embeddings. The model train with a dataset that consists of a mix of typos and correct words. To demonstrate the effectiveness of the proposed model, we conduct three kinds of intrinsic and extrinsic tasks. While the existing embedding models fail to maintain stable performance as the noise level increases, the proposed model shows stable performance.
Original language | English |
---|---|
Title of host publication | EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 3213-3221 |
Number of pages | 9 |
ISBN (Electronic) | 9781954085022 |
Publication status | Published - 2021 |
Event | 16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 - Virtual, Online Duration: 2021 Apr 19 → 2021 Apr 23 |
Publication series
Name | EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference |
---|
Conference
Conference | 16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 |
---|---|
City | Virtual, Online |
Period | 21/4/19 → 21/4/23 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics
ASJC Scopus subject areas
- Software
- Computational Theory and Mathematics
- Linguistics and Language