Dealing with the Paradox of Quality Estimation

Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

Research output: Contribution to conferencePaperpeer-review

4 Citations (Scopus)

Abstract

In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence. However, there exists a paradox in that constructing a dataset for creating a QE model requires non-trivial human labor and time, and it may even requires additional effort compared to the cost of constructing a parallel corpus. In this study, to address this paradox and utilize the various applications of QE, even in low-resource languages (LRLs), we propose a method for automatically constructing a pseudo-QE dataset without using human labor. We perform a comparative analysis on the pseudo-QE dataset using multilingual pre-trained language models. As we generate the pseudo dataset, we conduct experiments using various external machine translators as test sets to verify the accuracy of the results objectively. Also, the experimental results show that multilingual BART demonstrates the best performance, and we confirm the applicability of QE in LRLs using pseudo-QE dataset construction methods.

Original languageEnglish
Pages1-10
Number of pages10
Publication statusPublished - 2021
Event4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021 - Virtual, Online, United States
Duration: 2021 Aug 162021 Aug 20

Conference

Conference4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021
Country/TerritoryUnited States
CityVirtual, Online
Period21/8/1621/8/20

Bibliographical note

Funding Information:
This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2018-0-01405) supervised by the IITP(Institute for Information Communications Technology Planning Evaluation) and the MSIT, Korea, under the ICT Creative Consilience program(IITP-2021-2020-0-01819) supervised by the IITP. Additionally, this work was supported by Institute for Information communications Technology Planning Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2020-0-00368, A Neural-Symbolic Model for Knowledge Acquisition and Inference Techniques).

Publisher Copyright:
© 2021 Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021. All rights reserved.

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Dealing with the Paradox of Quality Estimation'. Together they form a unique fingerprint.

Cite this