Can AI Replicate Legal Precision? An Empirical Assessment of Legal Hallucinations in Large Language Models
Abstract
In recent years, the application of Large Language Models (LLMs) in the legal field has gradually increased. As they evolved, their function has transitioned from simple legal knowledge extraction to the reasoning and auditing of systematic legal knowledge. However, despite the continuous iteration and upgrading of LLMs, hallucinations (content generated by artificial intelligence appears reasonable but is actually inconsistent with facts or the input information) generated during their use remain prevalent. This poses unprecedented challenges to the legal field, which demands high standards of rigor. Based on the LLMs commonly used in the market, this study conducted a systematic empirical legal evaluation. The research reveals that while legal-specific LLMs exhibit a lower hallucination rate compared to general-purpose LLMs, the rate remains concerning. Therefore, this paper presents four core contributions: First, it systematically evaluates and reports the performance of common LLMs in the legal field; second, it constructs a four-section dataset based on Legal Knowledge Q&A to evaluate legal hallucinations; and, third, it conducts a comparative analysis of the differences in hallucination trade-off standards between legal knowledge Q&A and general knowledge Q&A. The results indicate that the application of LLMs in the legal field still requires improvement and upgrades, suggesting that users must assume the inherent responsibility of oversight and verification regarding the legal conclusions generated by LLMs.
Keywords
Citation Information
@article{luxu2026,
title={Can AI Replicate Legal Precision? An Empirical Assessment of Legal Hallucinations in Large Language Models},
author={lu xu and xingyu wan and hao zou and jiahao luo and ruolong ma and qingyu liang},
journal={Research Square},
year={2026},
doi={https://doi.org/10.21203/rs.3.rs-9143804/v1}
}
SinoXiv