Abstract
Small language models (SLM) offer promise for medical applications by addressing the privacy and hardware constraints of large language models; however, their limited parameters (often fewer than ten billion) hinder multi-step reasoning for complex medical tasks. This study presents Meerkat, a new family of medical SLMs designed to be lightweight while enhancing reasoning capabilities. We begin by designing an effective and efficient training method. This involves extracting high-quality chain-of-thought reasoning paths from 18 medical textbooks, which are then combined with diverse instruction-following datasets within the medical domain, totaling 441K training examples. Fine-tuning was conducted on open-source SLMs using this curated dataset. Our Meerkat-7B and Meerkat-8B models outperformed their counterparts by 22.3% and 10.6% across six exam datasets, respectively. They also improved scores on the NEJM Case Challenge from 7 to 16 and from 13 to 20, surpassing the human score of 13.7. Additionally, they demonstrated superiority in expert evaluations, excelling in all metrics—completeness, factuality, clarity, and logical consistency—of reasoning abilities.
| Original language | English |
|---|---|
| Article number | 240 |
| Journal | npj Digital Medicine |
| Volume | 8 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 2025 Dec |
Bibliographical note
Publisher Copyright:© The Author(s) 2025.
ASJC Scopus subject areas
- Medicine (miscellaneous)
- Health Informatics
- Computer Science Applications
- Health Information Management
Fingerprint
Dive into the research topics of 'Small language models learn enhanced reasoning skills from medical textbooks'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS