Explainable Machine Learning for Predicting Progression From IgA Vasculitis to IgA Vasculitis Nephritis in Children: A Dual-Centre Retrospective Study
Abstract
Objective IgA vasculitis nephritis (IgAVN) plays a decisive role in the long-term prognosis of pediatric IgA vasculitis (IgAV), yet the temporal lag of routine urinalysis frequently hinders early and precise risk stratification. This study aimed to develop a machine-learning predictive model using non-invasive peripheral blood parameters to facilitate early identification of IgAVN progression and to reveal underlying pathophysiological risk thresholds.Methods This retrospective study enrolled 509 pediatric IgAV patients from Siyang Hospital and Shanxian Central Hospital, among whom 213 developed IgAVN. Twelve core features were selected based on the intersection of LASSO regression and the Boruta algorithm. Seven machine learning algorithms were systematically evaluated to construct the optimal eXtreme Gradient Boosting (XGBoost) model. Furthermore, the Shapley Additive Explanations (SHAP) framework was incorporated to quantify feature importance and to decipher non-linear risk interactions.Results The XGBoost model demonstrated outstanding predictive performance in the independent validation set, achieving an area under the receiver operating characteristic curve (AUC) of 0.966, an accuracy of 0.907, an F1 score of 0.892, and a sensitivity of 0.921. SHAP analysis identified the Inflammatory Burden Index (IBI), C-reactive protein (CRP), and monocyte-to-lymphocyte ratio (MLR) as the primary driving factors. SHAP dependence plots revealed critical non-linear threshold effects: the risk of IgAVN escalated sharply and non-linearly in the presence of early subclinical albumin (ALB) depletion and decompensated inflammatory load. Decision curve analysis (DCA) demonstrated that the model achieved substantial clinical net benefit across a broad continuum of threshold probabilities.Conclusion The explainable XGBoost model, developed utilising routine non-invasive peripheral blood parameters, demonstrates promising potential as a supportive tool for the early risk stratification of IgAVN. By visualising complex, data-driven nonlinear risk inflexion points, this model may assist frontline clinicians in more effectively identifying high-risk pediatric patients in outpatient and emergency settings. Ultimately, these findings provide an objective reference to inform future clinical strategies to optimise the timing of interventions and potentially minimise unnecessary immunosuppressive exposure in low-risk patients.
Citation Information
@article{qingkaiwang2026,
title={Explainable Machine Learning for Predicting Progression From IgA Vasculitis to IgA Vasculitis Nephritis in Children: A Dual-Centre Retrospective Study},
author={Qingkai Wang and Hao Qiu and Liran Shen and Jinxing Dai and Fachen Miao and Qianjin Shi and Yunbiao Zhang and Kang Shen and Weibing Qiu},
journal={Research Square},
year={2026},
doi={https://doi.org/10.21203/rs.3.rs-9261325/v1}
}
SinoXiv