Abstract
In quality estimation (QE), the quality of translation can be predicted by referencing the source sentence and the machine translation (MT) output without access to the reference sentence. However, there exists a paradox in that constructing a dataset for creating a QE model requires non-trivial human labor and time, and it may even requires additional effort compared to the cost of constructing a parallel corpus. In this study, to address this paradox and utilize the various applications of QE, even in low-resource languages (LRLs), we propose a method for automatically constructing a pseudo-QE dataset without using human labor. We perform a comparative analysis on the pseudo-QE dataset using multilingual pre-trained language models. As we generate the pseudo dataset, we conduct experiments using various external machine translators as test sets to verify the accuracy of the results objectively. Also, the experimental results show that multilingual BART demonstrates the best performance, and we confirm the applicability of QE in LRLs using pseudo-QE dataset construction methods.
Original language | English |
---|---|
Pages | 1-10 |
Number of pages | 10 |
Publication status | Published - 2021 |
Event | 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021 - Virtual, Online, United States Duration: 2021 Aug 16 → 2021 Aug 20 |
Conference
Conference | 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 21/8/16 → 21/8/20 |
Bibliographical note
Funding Information:This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2018-0-01405) supervised by the IITP(Institute for Information Communications Technology Planning Evaluation) and the MSIT, Korea, under the ICT Creative Consilience program(IITP-2021-2020-0-01819) supervised by the IITP. Additionally, this work was supported by Institute for Information communications Technology Planning Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2020-0-00368, A Neural-Symbolic Model for Knowledge Acquisition and Inference Techniques).
Publisher Copyright:
© 2021 Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021. All rights reserved.
ASJC Scopus subject areas
- Software
- Language and Linguistics
- Human-Computer Interaction