Abstract
The evolution of large language models (LLMs) has culminated in a multitask model paradigm where prompts drive the generation of user-specific outputs. However, this advancement has revealed a challenge: LLMs frequently produce outputs against socially acceptable commonsense standards in various scenarios. To address this gap in commonsense reasoning, we present KoCommonGEN v2, a fine-grained benchmark dataset focused on Korean commonsense reasoning. This dataset, enriched with human annotations, comprises multiple-choice questions across seven error categories. These categories include commonsense memorization, numerical commonsense, toxic speech, and more, which are vulnerable to undermining the reliability of LLMs' commonsense reasoning capabilities. The empirical results present that LLMs struggle with Korean commonsense reasoning. With human accuracy benchmarked at approximately 85%, GPT-4's performance lags at about 74%, and other LLMs demonstrate an average accuracy of around 42%. Our findings emphasize the need for targeted improvements in Korean commonsense reasoning within LLMs, paving the way for more socially and contextually sensitive AI models. KoCommonGEN v2 is one of the benchmark datasets for the Open Ko-LLM Leaderboard.
| Original language | English |
|---|---|
| Title of host publication | The 62nd Annual Meeting of the Association for Computational Linguistics |
| Subtitle of host publication | Findings of the Association for Computational Linguistics, ACL 2024 |
| Editors | Lun-Wei Ku, Andre Martins, Vivek Srikumar |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 2390-2415 |
| Number of pages | 26 |
| ISBN (Electronic) | 9798891760998 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand Duration: 2024 Aug 11 → 2024 Aug 16 |
Publication series
| Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
|---|---|
| ISSN (Print) | 0736-587X |
Conference
| Conference | Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 |
|---|---|
| Country/Territory | Thailand |
| City | Hybrid, Bangkok |
| Period | 24/8/11 → 24/8/16 |
Bibliographical note
Publisher Copyright:© 2024 Association for Computational Linguistics.
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language
- Computer Science Applications
Fingerprint
Dive into the research topics of 'KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS