A large-scale dataset for korean document-level relation extraction from encyclopedia texts

Suhyune Son, Jungwoo Lim, Seonmin Koo, Jinsung Kim, Younghoon Kim, Youngsik Lim, Dongseok Hyun, Heuiseok Lim

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Document-level relation extraction (RE) aims to predict the relational facts between two given entities from a document. Unlike widespread research on document-level RE in English, Korean document-level RE research is still at the very beginning due to the absence of a dataset. To accelerate the studies, we present TREK (Toward Document-Level Relation Extraction in Korean) dataset constructed from Korean encyclopedia documents written by the domain experts. We provide detailed statistical analyses for our large-scale dataset and human evaluation results suggest the assured quality of TREK. Also, we introduce the document-level RE model that considers the named entity-type while considering the Korean language’s properties. In the experiments, we demonstrate that our proposed model outperforms the baselines and conduct qualitative analysis.

    Original languageEnglish
    Pages (from-to)8681-8701
    Number of pages21
    JournalApplied Intelligence
    Volume54
    Issue number17-18
    DOIs
    Publication statusPublished - 2024 Sept

    Bibliographical note

    Publisher Copyright:
    © The Author(s) 2024.

    Keywords

    • Document-level Relation Extraction
    • Information Extraction
    • Korean Relation Extraction
    • Natural Language Processing

    ASJC Scopus subject areas

    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'A large-scale dataset for korean document-level relation extraction from encyclopedia texts'. Together they form a unique fingerprint.

    Cite this