Controlled Benchmarking of Active Learning for Low-Label ADME Classification Under Scaffold Split
Abstract
Active learning is appealing for low-label ADME modeling, but its value under scaffold split remains difficult to interpret because acquisition rules are often compared under changing representations, budgets, or data partitions. Here we present a controlled benchmark of Random, Entropy, BALD, BADGE, and a calibrated diversity-aware variant (CDB-AL) on seven public ADME binary classification tasks under fixed Bemis-Murcko scaffold splits, five seeds, ten active-learning rounds, and two representation families (ECFP-MLP and ChemBERTa-MLP). We summarize performance with test PR-AUC, area under the learning curve (AULC), budget to 90\% of the full-data ceiling, and expected calibration error. Across the 14 dataset-representation slices, active learning improved AULC more consistently than final PR-AUC: the best acquisition outperformed random in AULC in all slices, but improved final PR-AUC in only 9. No acquisition function was best everywhere. BADGE provided the most reliable overall baseline, achieving the best AULC in 7 of 14 slices and the lowest mean rank, whereas BALD remained competitive in several ECFP settings and CDB-AL was helpful mainly in selected calibration-sensitive regimes rather than as a universal upgrade. Budget stage and molecular representation both changed the acquisition ranking, suggesting that method choice should be treated as a protocol-dependent workflow decision rather than a fixed default. These results frame active learning for scaffold-split ADME classification primarily as a problem of learning efficiency and conditional method selection, not a search for a single winner.
Keywords
Citation Information
@article{shenkuangwu2026,
title={Controlled Benchmarking of Active Learning for Low-Label ADME Classification Under Scaffold Split},
author={Shenkuang Wu and Siyuan Zhang and Xuanda Wu},
journal={Journal of Computer-Aided Molecular Design},
year={2026},
doi={https://doi.org/10.21203/rs.3.rs-9310145/v1}
}
SinoXiv