Undo dependent variable transformations with recipes after predict()

What is the best way to undo a transformation of a dependent variable with tidymodels? What if the transformation uses parameters estimated with the training data (e.g. mean centering or Yeo-Johnson?).

I read through a bunch of tidymodels resources and couldn't find anything. I saw that Max Kuhn and Julia Silge wrote this:

It is best practice to analyze the predictions on the transformed scale (if one were used) even if the predictions are reported using the original units.

But after picking the "best" model with resampling sometimes I want to untransform my predictions. Here is a reprex. I want to use the sample mean of the training data to un-center the predictions.

library(tidymodels)

# create a recipe
mtcars_rec <- recipe(data = mtcars, formula = mpg ~ .) %>%
  step_center(mpg) %>%
  add_role(mpg, new_role = "reponse")

# create a model
dt_mod <- decision_tree() %>%
  set_engine("rpart") %>%
  set_mode("regression")

# create a workflow
mtcars_wflow <- 
  workflow() %>% 
  add_model(dt_mod) %>% 
  add_recipe(mtcars_rec)

# estimate the model
mtcars_fit <- 
  mtcars_wflow %>%
  fit(data = mtcars)
 
# make a new prediction (mtcars reused for convenience)
predict(mtcars_fit, mtcars)
#> # A tibble: 32 x 1
#>    .pred
#>    <dbl>
#>  1 -1.83
#>  2 -1.83
#>  3  6.57
#>  4 -1.83
#>  5 -1.83
#>  6 -1.83
#>  7 -6.68
#>  8  6.57
#>  9  6.57
#> 10 -1.83
#> # … with 22 more rows

Created on 2020-10-20 by the reprex package (v0.3.0)

For now, you'll have to do that manually, probably with dplyr::mutate() calls.

Keep in mind that the recipes package Yeo-Johnson doesn't exactly do what you want on the outcome. For that procedure, you would fit an initial model then use the residuals to estimate the transformation.

We're getting over the hump on some filling in the gaps for tune and parsnip. I don't think that we'll have a solution for you this calendar year but post-processing is on the list and we are diligently working through the top issues. Although we make our own priorities, the user survey that we did on twitter did not rank post-processing in the top 5 (it was #6).

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.