Predictive Modeling of Post-Allogeneic Transplant Outcomes Using Machine Learning and Integrated Clinical and Immunogenetic Data: a Study from the SFGM-TC and a Multicenter US Consortium
Abstract
Allogeneic hematopoietic cell transplantation (allo-HCT) remains the only curative option for many hematologic malignancies, yet transplant-related toxicity and relapse continue to constrain long-term benefit. Prognostic tools based on clinical variables alone show limited transportability across centers. We hypothesized that integrating immunogenetic architecture, captured by HLA Evolutionary Divergence (HED), into machine-learning survival models would improve prediction of graft-versus-host disease–free/relapse-free survival (GRFS). We developed SMART, a time-dependent framework that estimates dynamic GRFS probabilities after allo-HCT. We analyzed 16,028 adults transplanted between 2010–2022. Model development used the French SFGM-TC registry (N = 13,979), split into training (N = 9,840) and held-out test (N = 4,139) sets. Random survival forests, XGBoost-Cox, and elastic-net Cox models were trained using 9 clinical predictors, with or without 10 recipient/donor locus-specific HED features (19 predictors total) and externally evaluated in 616 patients from five U.S. centers with complete data. Across algorithms, discrimination for this composite endpoint was modest (c-index < 0.60), but the addition of HED consistently improved performance and enabled reproducible stratification into low-, intermediate-, and high-risk groups based on the cumulative hazard score. Model-based simulations uncovered a non-linear (U-shaped) association between total HED and GRFS, with optimal outcomes at intermediate HED levels (~ 75th percentile). In haploidentical transplantation (N = 2,056), outcomes were maximized when donor and recipient HED were concordant (“match like with like”). In 9/10 mismatched unrelated donors (N = 1,326), HLA-B mismatches showed the greatest HED sensitivity. Integrating immunogenetics with clinical data improves GRFS risk modeling and supports HED as an actionable feature for donor selection and pre-transplant risk stratification. SMART is available for research use.
Citation Information
@article{simonapagliuca2026,
title={Predictive Modeling of Post-Allogeneic Transplant Outcomes Using Machine Learning and Integrated Clinical and Immunogenetic Data: a Study from the SFGM-TC and a Multicenter US Consortium},
author={Simona Pagliuca and Vincent Alcazer and Mélanie Gaudfrin and Nicole Raus and Anne Huynh and Raynier Devillier and Arda Dumaz and Ashwin Kishtagari and Lukasz Gondek and Mark Juckett and Suresh Kumar Balasubramanian and Marie Robin and Ibrahim Yakoub-Agha and Edouard Forcade and Claude Eric Bulabois and Jean-Baptiste Méar and Patrice Chevallier and Cristina Castilla Llorente and Xavier Poiré and Michaël Loschi and Johan Maertens and Stéphanie Nguyen and Frédéric Baron and Etienne Daguindau and Natacha Maillard and Florent Malard and Alice Aarnink and Maud D’Aveni-Piney and Marie Thérèse Rubio and Francesca Ferraro and Valeria Visconte and Tobias Lenz and Carmelo Gurnari and Jaroslaw Maciejewski},
journal={Research Square},
year={2026},
doi={https://doi.org/10.21203/rs.3.rs-9383323/v1}
}
SinoXiv