Abstract
This research aims to explore the comprehension of historical Korean archives authored by common literati. Numerous endeavors have been made to study Korean historical documents; however, the majority of these endeavors focus solely on royal documents. By comparing the distinct linguistic characteristics between royal and commoner languages, this study challenges the applicability of the royal language-centric approach to commoner documents. In particular, we investigate the feasibility and limitations of existing resources that share the same writing system (Hanja) as historical Korean documents for processing Korean common literati documents. Through our investigation, we propose a simple yet effective methodology that enables the utilization of Hanja-based language resources in processing Korean common literati documents: the removal of special characters. We demonstrate that aligning characteristics of Hanja-based resources allows considerable performance improvements. To the best of our knowledge, our study represents the first research endeavor to concentrate on the comprehension of common literati documents.
Original language | English |
---|---|
Pages (from-to) | 59909-59919 |
Number of pages | 11 |
Journal | IEEE Access |
Volume | 12 |
DOIs | |
Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- ancient language processing
- deep learning
- named entity recognition
- Natural language processing
- sentence segmentation
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering