Exploiting Hanja-Based Resources in Processing Korean Historic Documents Written by Common Literati

Hyeonseok Moon, Myunghoon Kang, Jaehyung Seo, Sugyeong Eo, Chanjun Park, Yeongwook Yang, Heuiseok Lim

Research output: Contribution to journalArticlepeer-review

Abstract

This research aims to explore the comprehension of historical Korean archives authored by common literati. Numerous endeavors have been made to study Korean historical documents; however, the majority of these endeavors focus solely on royal documents. By comparing the distinct linguistic characteristics between royal and commoner languages, this study challenges the applicability of the royal language-centric approach to commoner documents. In particular, we investigate the feasibility and limitations of existing resources that share the same writing system (Hanja) as historical Korean documents for processing Korean common literati documents. Through our investigation, we propose a simple yet effective methodology that enables the utilization of Hanja-based language resources in processing Korean common literati documents: the removal of special characters. We demonstrate that aligning characteristics of Hanja-based resources allows considerable performance improvements. To the best of our knowledge, our study represents the first research endeavor to concentrate on the comprehension of common literati documents.

Original languageEnglish
Pages (from-to)59909-59919
Number of pages11
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • ancient language processing
  • deep learning
  • named entity recognition
  • Natural language processing
  • sentence segmentation

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Exploiting Hanja-Based Resources in Processing Korean Historic Documents Written by Common Literati'. Together they form a unique fingerprint.

Cite this