Research Article 2026-04-22 posted v1

On the inescapable bias in random forests: sources, manifestations, and corrections

M
Matthew Berkowitz Simon Fraser University
R
Rachel MacKay Altman Simon Fraser University
T
Thomas M. Loughin Simon Fraser University

Abstract

In regression settings, random forests (RFs) often produce unavoidably biased estimates and predictions. We explain sources of bias in terms of RF-based estimated conditional distribution functions (ECDFs). For given covariate values, the RF ECDF is typically based on observations that are not identically distributed, which can produce bias in the ECDF and in mean or quantile estimates. Bias is especially pronounced in sparsely populated regions and when tail quantiles are estimated, as with prediction intervals. We distinguish distal and proximal sources of bias, show how they manifest differently, and explain how tuning parameters and data complexity contribute to ECDF bias. We propose a two-stage bias-correction procedure to reduce bias in the ECDF and in estimates derived from it, including means and quantiles. Using an estimate of the relationship between the RF ECDF and the covariates, we develop a bias adjustment for the entire ECDF and derived estimates. Compared with other procedures, ours was, in the settings considered, more effective at reducing conditional bias in 0.5-quantile estimates while maintaining or reducing MSE. We also show settings where its conditional bias adjustment yields prediction intervals valid over a larger region and/or with less coverage error than other methods.

Citation Information

@article{matthewberkowitz2026,
  title={On the inescapable bias in random forests: sources, manifestations, and corrections},
  author={Matthew Berkowitz and Rachel MacKay Altman and Thomas M. Loughin},
  journal={Research Square},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9431439/v1}
}
Back to Top
Home
Paper List
Submit
0.019760s