LIQUID: A Framework for List Question Answering Dataset Generation

Seongyun Lee, Hyunjae Kim, Jaewoo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Question answering (QA) models often rely on large-scale training datasets, which necessitates the development of a data generation framework to reduce the cost of manual annotations. Although several recent studies have aimed to generate synthetic questions with single-span answers, no study has been conducted on the creation of list questions with multiple, non-contiguous spans as answers. To address this gap, we propose LIQUID, an automated framework for generating list QA datasets from unlabeled corpora. We first convert a passage from Wikipedia or PubMed into a summary and extract named entities from the summarized text as candidate answers. This allows us to select answers that are semantically correlated in context and is, therefore, suitable for constructing list questions. We then create questions using an off-the-shelf question generator with the extracted entities and original passage. Finally, iterative filtering and answer expansion are performed to ensure the accuracy and completeness of the answers. Using our synthetic data, we significantly improve the performance of the previous best list QA models by exact-match F1 scores of 5.0 on MultiSpanQA, 1.9 on Quoref, and 2.8 averaged across three BioASQ benchmarks.

Original languageEnglish
Title of host publicationAAAI-23 Technical Tracks 11
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI press
Pages13014-13024
Number of pages11
ISBN (Electronic)9781577358800
Publication statusPublished - 2023 Jun 27
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: 2023 Feb 72023 Feb 14

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Volume37

Conference

Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States
CityWashington
Period23/2/723/2/14

Bibliographical note

Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'LIQUID: A Framework for List Question Answering Dataset Generation'. Together they form a unique fingerprint.

Cite this