Systematic Review 2026-04-23 posted v1

Small Language Models in Clinical Medicine: A Systematic Review of Performance, Safety, and Deployment Feasibility

A
Alon Gorenshtein BRIDGE GenAI Lab, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
M
Mahmud Omar BRIDGE GenAI Lab, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
Y
Yiftach Barash BRIDGE GenAI Lab, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
J
Jonathan B. Kruskal Department of Radiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
M
Muneeb Ahmed Department of Radiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
O
Olga R. Brook Department of Radiology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
B
Ben Illigens Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
G
Girish N. Nadkarni The Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Medical Center, New York, NY, USA
E
Eyal Klang BRIDGE GenAI Lab, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA

Abstract

Large language models are increasingly used in clinical medicine, but their reliance on cloud servers conflicts with patient-privacy requirements and excludes resource-limited healthcare systems. Small language models (SLMs) of up to four billion parameters can run locally on a single commodity GPU, keeping data inside the institution while reaching performance comparable to much larger systems. Here we systematically review 14 studies that deploy SLMs for clinical prediction, information extraction, and medical question answering. Domain- adapted small models reached a median 91% of the best reported performance of larger baselines, and we found no significant correlation between parameter count and task accuracy. Only half of the studies evaluated hallucination rates, and none reported calibration or epistemic uncertainty. The computational case for on-premise clinical AI is therefore strong, but the safety engineering required for responsible deployment, particularly in agentic sub-agent pipelines, is largely absent.

Citation Information

@article{alongorenshtein2026,
  title={Small Language Models in Clinical Medicine: A Systematic Review of Performance, Safety, and Deployment Feasibility},
  author={Alon Gorenshtein and Mahmud Omar and Yiftach Barash and Jonathan B. Kruskal and Muneeb Ahmed and Olga R. Brook and Ben Illigens and Girish N. Nadkarni and Eyal Klang},
  journal={Research Square},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9488729/v1}
}
Back to Top
Home
Paper List
Submit
0.019050s