Abstract
This paper explores the potential of Large Language Models (LLMs), specifically GPT-4, to enhance the precision and effectiveness of Automated Assessment Systems (AAS) for open-ended mathematics problems. While LLMs have demonstrated transformative capabilities across various disciplines, their application in AAS, particularly for mathematical logic and open-ended problem-solving, still needs to be explored. Our research addresses this gap by developing and critically evaluating a GPT-4-based AAS. We analyzed 4,180 responses to open-ended mathematics questions from 380 6th-grade primary school students. Three human experts and the GPT-4 model independently assessed these responses using a pre-established rubric. Our findings reveal high consistency between human and GPT-4 assessments in most instances, highlighting the potential of integrating GPT-4 into AAS. We categorized scoring discrepancies from GPT-4 and human raters by error type and identified specific mathematical content areas where automated assessment faced limitations. We evaluated two strategies to enhance GPT-4’s assessment capabilities: (1) using elaborate prompts and (2) implementing advanced prompt engineering techniques such as Chain-of-thought, Self-consistency, and Tree-of-thought. While comprehensive prompts significantly improved assessment quality, applying advanced prompt engineering techniques directly produced suboptimal results, indicating a need for further refinement. This study contributes to the emerging body of research evaluating GPT-4 in the context of AAS for open-ended mathematics problems, shedding light on both the strengths and limitations of this approach. Our findings provide valuable insights and a foundation for future research to refine the integration of LLMs in AAS, particularly in mathematics education.
| Original language | English |
|---|---|
| Pages (from-to) | 1560-1596 |
| Number of pages | 37 |
| Journal | International Journal of Artificial Intelligence in Education |
| Volume | 35 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - 2025 Sept |
Bibliographical note
Publisher Copyright:© International Artificial Intelligence in Education Society 2024.
ASJC Scopus subject areas
- Education
- Computational Theory and Mathematics