Research Article 2026-04-22 under-review v1

Controlled Benchmarking of Active Learning for Low-Label ADME Classification Under Scaffold Split

S
Shenkuang Wu Hunan Normal University
S
Siyuan Zhang The Second People's Hospital of Hunan Province / Brain Hospital of Hunan Province
X
Xuanda Wu Hunan Normal University

Abstract

Active learning is appealing for low-label ADME modeling, but its value under scaffold split remains difficult to interpret because acquisition rules are often compared under changing representations, budgets, or data partitions. Here we present a controlled benchmark of Random, Entropy, BALD, BADGE, and a calibrated diversity-aware variant (CDB-AL) on seven public ADME binary classification tasks under fixed Bemis-Murcko scaffold splits, five seeds, ten active-learning rounds, and two representation families (ECFP-MLP and ChemBERTa-MLP). We summarize performance with test PR-AUC, area under the learning curve (AULC), budget to 90\% of the full-data ceiling, and expected calibration error. Across the 14 dataset-representation slices, active learning improved AULC more consistently than final PR-AUC: the best acquisition outperformed random in AULC in all slices, but improved final PR-AUC in only 9. No acquisition function was best everywhere. BADGE provided the most reliable overall baseline, achieving the best AULC in 7 of 14 slices and the lowest mean rank, whereas BALD remained competitive in several ECFP settings and CDB-AL was helpful mainly in selected calibration-sensitive regimes rather than as a universal upgrade. Budget stage and molecular representation both changed the acquisition ranking, suggesting that method choice should be treated as a protocol-dependent workflow decision rather than a fixed default. These results frame active learning for scaffold-split ADME classification primarily as a problem of learning efficiency and conditional method selection, not a search for a single winner.

Citation Information

@article{shenkuangwu2026,
  title={Controlled Benchmarking of Active Learning for Low-Label ADME Classification Under Scaffold Split},
  author={Shenkuang Wu and Siyuan Zhang and Xuanda Wu},
  journal={Journal of Computer-Aided Molecular Design},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9310145/v1}
}
Back to Top
Home
Paper List
Submit
0.023924s