Prediction intervals with tidymodels, best practices?

Max · September 29, 2020, 3:15pm

parsnip can produce them for model types that naturally make them.

Otherwise... that paper that it cited in the blog post (and its references) are doable. I would not use the 632 method here since, for some models, the apparent error rate is 0. I would also use 10-fold CV to get the residuals too (instead of re-predicting the training set).

I have some old caret code laying around to do this (only for regression models). However, as the blog shows, you need to do a large number of bootstrap fits of the model to get good coverage and stability.

A long while back I tried this out at my previous job. There were some issues about the generality of the resampling approach. Imagine a CART tree fit. Outside of the data range, the predictions are flat on either side of the distribution of x and intervals can be really misleading there.

I've got a lot going on currently but I'll try to create a gist that does some of this with rsample and parsnip. I won't support it so use it at your own risk. I'll give a link here if/when I have that working.