Abstract
Educational question-answer generation has been extensively researched owing to its practical applicability. However, we have identified a persistent challenge concerning the evaluation of such systems. Existing evaluation methods often fail to produce objective results and instead exhibit a bias towards favoring high similarity to the ground-truth question-answer pairs. In this study, we demonstrate that these evaluation methods yield low human alignment and propose an alternative approach called Generative Interpretation (GI) to achieve more objective evaluations. Through experimental analysis, we reveal that GI outperforms existing evaluation methods in terms of human alignment, and even shows comparable performance with GPT3.5, only with BART-large.
Original language | English |
---|---|
Title of host publication | EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024 |
Editors | Yvette Graham, Matthew Purver, Matthew Purver |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 2185-2196 |
Number of pages | 12 |
ISBN (Electronic) | 9798891760936 |
Publication status | Published - 2024 |
Event | 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024 - St. Julian's, Malta Duration: 2024 Mar 17 → 2024 Mar 22 |
Publication series
Name | EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024 |
---|
Conference
Conference | 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024 |
---|---|
Country/Territory | Malta |
City | St. Julian's |
Period | 24/3/17 → 24/3/22 |
Bibliographical note
Publisher Copyright:© 2024 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Software
- Linguistics and Language