Abstract
The main objective of the task-oriented dialogue system is to identify the intent and needs of human dialogue. Many existing studies are conducted under the setting of written dialogue, but there always exists a difficulty in coping with real-world spoken dialogues. To this end, DSTC10 challenge organizers propose the task of building robust dialogue state tracking (DST) models on spoken dialogues. With the powerful existing DST model (i.e., MinTL), this article suggests integral components for building a dialogue state tracker; 1) Data augmentation effectively enhances the capability of the model to catch the entities that exist in the evaluation dataset. 2) Levenshtein post-processing aims to prevent the distortion in model prediction caused by automatic speech recognition errors. To validate the effectiveness of our methods, we evaluate our model on DSTC10 datasets and conduct qualitative analysis by ablating each component of the model. Experimental results show that our model significantly outperforms baselines in all evaluation metrics and took 3rd place in the challenge.
Original language | English |
---|---|
Pages (from-to) | 727-732 |
Number of pages | 6 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 32 |
DOIs | |
Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2023 The Authors.
Keywords
- DSTC10
- dialogue state tracking
- spoken dialogue
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Computational Mathematics
- Electrical and Electronic Engineering
- Acoustics and Ultrasonics