KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models

  • Jaehyung Seo
  • , Jaewook Lee
  • , Chanjun Park
  • , Seongtae Hong
  • , Seungjun Lee
  • , Heuiseok Lim*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The evolution of large language models (LLMs) has culminated in a multitask model paradigm where prompts drive the generation of user-specific outputs. However, this advancement has revealed a challenge: LLMs frequently produce outputs against socially acceptable commonsense standards in various scenarios. To address this gap in commonsense reasoning, we present KoCommonGEN v2, a fine-grained benchmark dataset focused on Korean commonsense reasoning. This dataset, enriched with human annotations, comprises multiple-choice questions across seven error categories. These categories include commonsense memorization, numerical commonsense, toxic speech, and more, which are vulnerable to undermining the reliability of LLMs' commonsense reasoning capabilities. The empirical results present that LLMs struggle with Korean commonsense reasoning. With human accuracy benchmarked at approximately 85%, GPT-4's performance lags at about 74%, and other LLMs demonstrate an average accuracy of around 42%. Our findings emphasize the need for targeted improvements in Korean commonsense reasoning within LLMs, paving the way for more socially and contextually sensitive AI models. KoCommonGEN v2 is one of the benchmark datasets for the Open Ko-LLM Leaderboard.

Original languageEnglish
Title of host publicationThe 62nd Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationFindings of the Association for Computational Linguistics, ACL 2024
EditorsLun-Wei Ku, Andre Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages2390-2415
Number of pages26
ISBN (Electronic)9798891760998
DOIs
Publication statusPublished - 2024
EventFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand
Duration: 2024 Aug 112024 Aug 16

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityHybrid, Bangkok
Period24/8/1124/8/16

Bibliographical note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models'. Together they form a unique fingerprint.

Cite this