I put the above approach into a couple rough/quick functions: prep_interval()
that is set-up to take in a workflow (with a recipe and model specification) and output a list containing objects needed to produce new prediction intervals and then predict_interval()
that takes in the output from the above function + new data to produce prediction intervals on. See gist referenced below for documentation. The code below should essentially be equivalent to my prior example with rpart
...
library(tidyverse)
library(tidymodels)
set.seed(123)
iris <- as_tibble(iris)
split <- initial_split(iris)
train <- training(split)
test <- testing(split)
dt_mod <- parsnip::decision_tree() %>%
set_engine("rpart") %>%
set_mode("regression")
dt_rec <- recipe(Sepal.Length ~ Sepal.Width, data = train)
dt_wf <- workflows::workflow() %>%
add_model(dt_mod) %>%
add_recipe(dt_rec)
devtools::source_gist("https://gist.github.com/brshallo/3db2cd25172899f91b196a90d5980690")
# Maybe would be better to allow a more custom resamples object as well...
prepped_for_interval <- prep_interval(dt_wf, train)
prepped_for_interval
#> $model_uncertainty
#> # A tibble: 10 x 2
#> fit recipe
#> <list> <list>
#> 1 <fit[+]> <recipe>
#> 2 <fit[+]> <recipe>
#> 3 <fit[+]> <recipe>
#> 4 <fit[+]> <recipe>
#> 5 <fit[+]> <recipe>
#> 6 <fit[+]> <recipe>
#> 7 <fit[+]> <recipe>
#> 8 <fit[+]> <recipe>
#> 9 <fit[+]> <recipe>
#> 10 <fit[+]> <recipe>
#>
#> $sample_uncertainty
#> # A tibble: 113 x 1
#> .resid
#> <dbl>
#> 1 1.25
#> 2 -0.0444
#> 3 0.256
#> 4 -0.100
#> 5 1.75
#> 6 0.556
#> 7 -0.543
#> 8 -0.453
#> 9 0.947
#> 10 -0.443
#> # ... with 103 more rows
pred_interval <- predict_interval(prepped_for_interval, test, probs = c(0.05, 0.95))
pred_interval
#> # A tibble: 37 x 2
#> probs_0.05 probs_0.95
#> <dbl> <dbl>
#> 1 4.26 7.31
#> 2 4.00 7.02
#> 3 3.90 6.82
#> 4 4.40 7.69
#> 5 3.71 6.73
#> 6 4.00 7.01
#> 7 4.26 7.29
#> 8 3.70 6.74
#> 9 4.54 7.88
#> 10 3.91 7.26
#> # ... with 27 more rows
Created on 2021-03-04 by the reprex package (v0.3.0)
@Max the correct approach may be to lean on research in conformal prediction / inference. I pasted a few resources I skimmed below, though need to look into more closely (it seems like much of the research here comes out of either Carnegie Mellon or Royal Holloway University, London):
-
ryantibs/conformal: github repo with
conformalInference
R package and links to relevant articles on distribution-free predictive inference.conformalInferene
seems to be set-up not too dissimilarly from set-up above (in that takes in a model generating algorithm as input) -- seems could set-up interface or something similar in a way that is pretty tidy friendly (e.g.add_conformal()
...) - donlnz/nonconformist: python package
- Conformal Prediction: Link to Royal Holloway University website by creators of method -- Vladimir Vovk and Alex Gammerman.
- Assumption-free prediction intervals for black-box regression algorithms - Aaditya Ramdas (YouTube): professor at CMU giving overview of problem, approaches, and current "state-of-the-art"
- Tutorial on conformal inference, Dataiku article, Analytics Vidhya article
Resources suggest some methods may have high computation costs (e.g. jackknife+), others less so (e.g. split-conformal)... but again, need to read more closely.