Dealing with the Paradox of Quality Estimation

Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

    Research output: Contribution to conferencePaperpeer-review

    4 Citations (Scopus)

    Abstract

    In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence. However, there exists a paradox in that constructing a dataset for creating a QE model requires non-trivial human labor and time, and it may even requires additional effort compared to the cost of constructing a parallel corpus. In this study, to address this paradox and utilize the various applications of QE, even in low-resource languages (LRLs), we propose a method for automatically constructing a pseudo-QE dataset without using human labor. We perform a comparative analysis on the pseudo-QE dataset using multilingual pre-trained language models. As we generate the pseudo dataset, we conduct experiments using various external machine translators as test sets to verify the accuracy of the results objectively. Also, the experimental results show that multilingual BART demonstrates the best performance, and we confirm the applicability of QE in LRLs using pseudo-QE dataset construction methods.

    Original languageEnglish
    Pages1-10
    Number of pages10
    Publication statusPublished - 2021
    Event4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021 - Virtual, Online, United States
    Duration: 2021 Aug 162021 Aug 20

    Conference

    Conference4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021
    Country/TerritoryUnited States
    CityVirtual, Online
    Period21/8/1621/8/20

    Bibliographical note

    Funding Information:
    This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2018-0-01405) supervised by the IITP(Institute for Information Communications Technology Planning Evaluation) and the MSIT, Korea, under the ICT Creative Consilience program(IITP-2021-2020-0-01819) supervised by the IITP. Additionally, this work was supported by Institute for Information communications Technology Planning Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2020-0-00368, A Neural-Symbolic Model for Knowledge Acquisition and Inference Techniques).

    Publisher Copyright:
    © 2021 Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021. All rights reserved.

    ASJC Scopus subject areas

    • Software
    • Language and Linguistics
    • Human-Computer Interaction

    Fingerprint

    Dive into the research topics of 'Dealing with the Paradox of Quality Estimation'. Together they form a unique fingerprint.

    Cite this