save tidymodel recipe log to disk

Due to business requirements, recipe steps should be logged. So I read recipes package options and found

verbose = TRUE,
log_changes = TRUE

My main issue is that this seems to truncate output so when many variables are removed in a step, you only get a glimpse of a bunch of them

step_corr(all_predictors(), threshold = 0.7)


Correlation filter removed cost, age, identifier, genre, ... [trained]

So this kind of log is incomplete. This information should be available to business users to check the accuracy of the pre-process steps or just for model documentation

Is there any way of getting a complete log from recipe steps execution to file?

Hi @rdataforge,

The truncation of the message is an artifact of the print() method for a prep'd recipe object. However, the names of the variables that are removed by the correlation step are retained as a character vector nested in the list object.

If you want to have a "log" of the recipe preparation, you can simply save the prepared object (po in the example below) to disk (such as with saveRDS()) and it will contain a record of all of the recipe information.

Here is an example:

library(recipes)

po <- recipe(mpg ~ ., mtcars) %>% 
  step_corr(all_predictors()) %>% 
  prep()

print(po)
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor         10
#> 
#> Training data contained 32 data points and no missing data.
#> 
#> Operations:
#> 
#> Correlation filter removed cyl [trained]

# Which columns removed by Step 1 (correlation)
po$steps[[1]]$removals
#> [1] "cyl"

Ok, this could be a workaround. I will write code to save every step to a log file and check with business people.
Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.