Prediction intervals with tidymodels, best practices?

parsnip can produce them for model types that naturally make them.

Otherwise... that paper that it cited in the blog post (and its references) are doable. I would not use the 632 method here since, for some models, the apparent error rate is 0. I would also use 10-fold CV to get the residuals too (instead of re-predicting the training set).

I have some old caret code laying around to do this (only for regression models). However, as the blog shows, you need to do a large number of bootstrap fits of the model to get good coverage and stability.

A long while back I tried this out at my previous job. There were some issues about the generality of the resampling approach. Imagine a CART tree fit. Outside of the data range, the predictions are flat on either side of the distribution of x and intervals can be really misleading there.

I've got a lot going on currently but I'll try to create a gist that does some of this with rsample and parsnip. I won't support it so use it at your own risk. I'll give a link here if/when I have that working.

6 Likes