TY - JOUR
T1 - A Survey on Evaluation Metrics for Machine Translation
AU - Lee, Seungjun
AU - Lee, Jungseob
AU - Moon, Hyeonseok
AU - Park, Chanjun
AU - Seo, Jaehyung
AU - Eo, Sugyeong
AU - Koo, Seonmin
AU - Lim, Heuiseok
N1 - Funding Information:
This research was supported by the Ministry of Science and ICT, Korea, under the Information Technology Research Center support program (IITP-2018-0-01405) supervised by the Institute for Information & Communications Technology Planning & Evaluation), and by the Basic Science Research Program through the National Research Foundation of Korea, funded by the Ministry of Education (NRF-2022R1A2C1007616).
Publisher Copyright:
© 2023 by the authors.
PY - 2023/2
Y1 - 2023/2
N2 - The success of Transformer architecture has seen increased interest in machine translation (MT). The translation quality of neural network-based MT transcends that of translations derived using statistical methods. This growth in MT research has entailed the development of accurate automatic evaluation metrics that allow us to track the performance of MT. However, automatically evaluating and comparing MT systems is a challenging task. Several studies have shown that traditional metrics (e.g., BLEU, TER) show poor performance in capturing semantic similarity between MT outputs and human reference translations. To date, to improve performance, various evaluation metrics have been proposed using the Transformer architecture. However, a systematic and comprehensive literature review on these metrics is still missing. Therefore, it is necessary to survey the existing automatic evaluation metrics of MT to enable both established and new researchers to quickly understand the trend of MT evaluation over the past few years. In this survey, we present the trend of automatic evaluation metrics. To better understand the developments in the field, we provide the taxonomy of the automatic evaluation metrics. Then, we explain the key contributions and shortcomings of the metrics. In addition, we select the representative metrics from the taxonomy, and conduct experiments to analyze related problems. Finally, we discuss the limitation of the current automatic metric studies through the experimentation and our suggestions for further research to improve the automatic evaluation metrics.
AB - The success of Transformer architecture has seen increased interest in machine translation (MT). The translation quality of neural network-based MT transcends that of translations derived using statistical methods. This growth in MT research has entailed the development of accurate automatic evaluation metrics that allow us to track the performance of MT. However, automatically evaluating and comparing MT systems is a challenging task. Several studies have shown that traditional metrics (e.g., BLEU, TER) show poor performance in capturing semantic similarity between MT outputs and human reference translations. To date, to improve performance, various evaluation metrics have been proposed using the Transformer architecture. However, a systematic and comprehensive literature review on these metrics is still missing. Therefore, it is necessary to survey the existing automatic evaluation metrics of MT to enable both established and new researchers to quickly understand the trend of MT evaluation over the past few years. In this survey, we present the trend of automatic evaluation metrics. To better understand the developments in the field, we provide the taxonomy of the automatic evaluation metrics. Then, we explain the key contributions and shortcomings of the metrics. In addition, we select the representative metrics from the taxonomy, and conduct experiments to analyze related problems. Finally, we discuss the limitation of the current automatic metric studies through the experimentation and our suggestions for further research to improve the automatic evaluation metrics.
KW - Transformer
KW - automatic evaluation metric
KW - deep learning
KW - machine translation
UR - http://www.scopus.com/inward/record.url?scp=85148584668&partnerID=8YFLogxK
U2 - 10.3390/math11041006
DO - 10.3390/math11041006
M3 - Article
AN - SCOPUS:85148584668
SN - 2227-7390
VL - 11
JO - Mathematics
JF - Mathematics
IS - 4
M1 - 1006
ER -