DTKF: A Sepsis Early Prediction Framework Based on Large Language Model Semantic Augmentation and Dual-Source Knowledge Distillation
Abstract
Background Sepsis is a life-threatening organ dysfunction caused by a dysregulated host response to infection. Early and accurate prediction is crucial for reducing mortality. However, clinical settings often face challenges such as data scarcity, class imbalance, and the underutilization of unstructured clinical notes. Existing scoring systems and traditional machine learning models primarily rely on structured vital signs, failing to capture the rich semantic risk factors embedded in textual data. Although Large Language Models (LLMs) excel in text understanding, their high computational costs and potential hallucinations hinder direct deployment in resource-constrained emergency settings.Methods To address these limitations, this study proposes a novel Dual-Teacher Knowledge Fusion (DTKF) framework. First, we introduce a semantic feature augmentation module that leverages DeepSeek to extract semantic risk probabilities from clinical notes and explicitly fuses them with structured vital signs to construct high-order hybrid features for the student model. Second, to prevent overfitting under small-sample conditions, we design a dual-source knowledge distillation strategy. This strategy integrates rule-based knowledge and data-driven semantic knowledge to construct hybrid teacher signals, regulating a lightweight logistic regression student model via soft-label supervision.Results Experiments on the MIMIC-IV dataset demonstrate that DTKF achieves an AUC of 0.796. This result not only significantly outperforms traditional unimodal baselines but also improves performance by 6.6% compared to direct prediction using LLMs.Conclusions This study effectively integrates rule-based and data-driven knowledge, enhances feature representation using textual information, and improves model performance and robustness through distillation techniques, providing a clinically valuable solution for sepsis early warning.
Keywords
Citation Information
@article{yitianzhang2026,
title={DTKF: A Sepsis Early Prediction Framework Based on Large Language Model Semantic Augmentation and Dual-Source Knowledge Distillation},
author={Yitian Zhang and Xinlei Ji and Huijie Zhang and Liang Zhou and Jiachen Guo and Lei Yin},
journal={BMC Medical Informatics and Decision Making},
year={2026},
doi={https://doi.org/10.21203/rs.3.rs-9199143/v1}
}
SinoXiv