Generating Predictive Distributions from Prediction Errors
Abstract: Breiman's seminal paper on random forests has more than 70,000 citations according to Google Scholar. The impact of Breiman's random forests on machine learning, data analysis, data science, and science in general is difficult to measure but unquestionably substantial. The virtues of random forest methodology include no need to specify functional forms relating predictors to a response variable, capable performance for low-sample-size high-dimensional data, general prediction accuracy, easy parallelization, few tuning parameters, and applicability to a wide range of prediction problems with categorical or continuous responses. Like many algorithmic approaches to prediction, random forests are typically used to produce point predictions that are not accompanied by information about how far those predictions may be from true response values. From the statistical point of view, this is unacceptable; a key characteristic that distinguishes statistically rigorous approaches to prediction from others is the ability to provide quantifiably accurate assessments of prediction error from the same data used to generate point predictions.
This talk will describe how we efficiently compute observable prediction errors during random forest construction and use those errors to generate predictive distributions to accompany random forest point predictions. We illustrate the effectiveness of the proposed approach by constructing random forest prediction intervals that perform well relative to competing methods for a variety of real datasets. We also illustrate how to use collections of scaled predictions errors from a linear model fit to estimate win probabilities in NCAA basketball tournament games. These game-specific win probabilities are then used to predict outcomes for the 2020 NCAA tournament that was cancelled last year due to COVID-19. We estimate the chance that the defending champion Virginia Cavaliers would have won the NCAA Championship again in 2020 and present win probabilities for the most likely winners of the men’s and women’s tournaments.
This talk features joint work with Chancellor Johnstone, Haozhe Zhang, Joshua Zimmerman, and Dan Nordman..
Xiwei Tang is inviting you to a scheduled Zoom meeting.
Topic: Statistics Colloquium Spring 2021
Time: This is a recurring meeting Meet anytime
Join Zoom Meeting
Meeting ID: 913 7582 2348