Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations

  • Jungwoo Lim
  • , Taesun Whang
  • , Dongyub Lee
  • , Heuiseok Lim*
  • *Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    2 Citations (Scopus)

    Abstract

    The main objective of the task-oriented dialogue system is to identify the intent and needs of human dialogue. Many existing studies are conducted under the setting of written dialogue, but there always exists a difficulty in coping with real-world spoken dialogues. To this end, DSTC10 challenge organizers propose the task of building robust dialogue state tracking (DST) models on spoken dialogues. With the powerful existing DST model (i.e., MinTL), this article suggests integral components for building a dialogue state tracker; 1) Data augmentation effectively enhances the capability of the model to catch the entities that exist in the evaluation dataset. 2) Levenshtein post-processing aims to prevent the distortion in model prediction caused by automatic speech recognition errors. To validate the effectiveness of our methods, we evaluate our model on DSTC10 datasets and conduct qualitative analysis by ablating each component of the model. Experimental results show that our model significantly outperforms baselines in all evaluation metrics and took 3rd place in the challenge.

    Original languageEnglish
    Pages (from-to)727-732
    Number of pages6
    JournalIEEE/ACM Transactions on Audio Speech and Language Processing
    Volume32
    DOIs
    Publication statusPublished - 2024

    Bibliographical note

    Publisher Copyright:
    © 2023 The Authors.

    Keywords

    • DSTC10
    • dialogue state tracking
    • spoken dialogue

    ASJC Scopus subject areas

    • Computer Science (miscellaneous)
    • Computational Mathematics
    • Electrical and Electronic Engineering
    • Acoustics and Ultrasonics

    Fingerprint

    Dive into the research topics of 'Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations'. Together they form a unique fingerprint.

    Cite this