Abstract
Recent machine translation (MT) systems have overcome language barriers for a wide range of users, yet they still carry the risk of critical meaning deviation. Critical error detection (CED) is a task that identifies an inherent risk of catastrophic meaning distortions in the machine translation output. With the importance of reflecting cultural elements in detecting critical errors, we introduce the culture-aware “Politeness” type in detecting English-Korean critical translation errors. Besides, we facilitate two tasks by providing multiclass labels: critical error detection and critical error type classification (CETC). Empirical evaluations reveal that our introduced data augmentation approach using a newly presented perturber significantly outperforms existing baselines in both tasks. Further analysis highlights the significance of multiclass labeling by demonstrating its superior effectiveness compared to binary labels.
Original language | English |
---|---|
Title of host publication | 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings |
Editors | Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue |
Publisher | European Language Resources Association (ELRA) |
Pages | 4705-4716 |
Number of pages | 12 |
ISBN (Electronic) | 9782493814104 |
Publication status | Published - 2024 |
Event | Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 - Hybrid, Torino, Italy Duration: 2024 May 20 → 2024 May 25 |
Publication series
Name | 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings |
---|
Conference
Conference | Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 |
---|---|
Country/Territory | Italy |
City | Hybrid, Torino |
Period | 24/5/20 → 24/5/25 |
Bibliographical note
Publisher Copyright:© 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Keywords
- Critical error detection
- Large language model
- Neural machine translation
- Quality estimation
ASJC Scopus subject areas
- Theoretical Computer Science
- Computational Theory and Mathematics
- Computer Science Applications