Improving Formality-Sensitive Machine Translation using Data-Centric Approaches and Prompt Engineering

Seungjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

In this paper, we present the KU x Upstage team’s submission for the Special Task on Formality Control on Spoken Language Translation, which involves translating English into four languages with diverse grammatical formality markers. Our methodology comprises two primary components: 1) a language-specific data-driven approach, and 2) the generation of synthetic data through the employment of large-scale language models and empirically-grounded prompt engineering. By adapting methodologies and models to accommodate the unique linguistic properties of each language, we observe a notable enhancement in performance relative to the baseline, substantiating the heightened efficacy of data-driven approaches. Moreover, our devised prompt engineering strategy yields superior synthetic translation instances.

Original languageEnglish
Title of host publication20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference
EditorsElizabeth Salesky, Marcello Federico, Marine Carpuat
PublisherAssociation for Computational Linguistics
Pages420-432
Number of pages13
ISBN (Electronic)9781959429845
DOIs
Publication statusPublished - 2023
Event20th International Conference on Spoken Language Translation, IWSLT 2023 - Hybrid, Toronto, Canada
Duration: 2023 Jul 132023 Jul 14

Publication series

Name20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference

Conference

Conference20th International Conference on Spoken Language Translation, IWSLT 2023
Country/TerritoryCanada
CityHybrid, Toronto
Period23/7/1323/7/14

Bibliographical note

Publisher Copyright:
© IWSLT 2023.All rights reserved.

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Improving Formality-Sensitive Machine Translation using Data-Centric Approaches and Prompt Engineering'. Together they form a unique fingerprint.

Cite this