Quantifying Information of Tokens for Simple and Flexible Simultaneous Machine Translation

Dong Hyun Lee, Minkyung Park, Byung Jun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Simultaneous Translation (ST) involves translating with only partial source inputs instead of the entire source inputs, a process that can potentially result in translation quality degradation. Previous approaches to balancing translation quality and latency have demonstrated that it is more efficient and effective to leverage an offline model with a reasonable policy. However, using an offline model also leads to a distribution shift since it is not trained with partial source inputs, and it can be improved by training an additional module that informs us when to translate. In this paper, we propose an Information Quantifier (IQ) that models source and target information to determine whether the offline model has sufficient information for translation, trained with oracle action sequences generated from the offline model. IQ, by quantifying information, helps in formulating a suitable policy for Simultaneous Translation that better generalizes and also allows us to control the trade-off between quality and latency naturally. Experiments on various language pairs show that our proposed model outperforms baselines.

Original languageEnglish
Title of host publicationCoNLL 2023 - 27th Conference on Computational Natural Language Learning, Proceedings
EditorsJing Jiang, David Reitter, Shumin Deng
PublisherAssociation for Computational Linguistics (ACL)
Pages200-210
Number of pages11
ISBN (Electronic)9798891760394
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event27th Conference on Computational Natural Language Learning, CoNLL 2023 - Singapore, Singapore
Duration: 2023 Dec 62023 Dec 7

Publication series

NameCoNLL 2023 - 27th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference27th Conference on Computational Natural Language Learning, CoNLL 2023
Country/TerritorySingapore
CitySingapore
Period23/12/623/12/7

Bibliographical note

Publisher Copyright:
© 2023 CoNLL 2023 - 27th Conference on Computational Natural Language Learning, Proceedings. All rights reserved.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Quantifying Information of Tokens for Simple and Flexible Simultaneous Machine Translation'. Together they form a unique fingerprint.

Cite this