Extract formula from recipes::recipe() object?

Hello. I am working with the vignette of the new rsample package located here. This package is used in conjunction with recipes as part of Max Kuhn's tidy modeling approach. Is there a way to extract the model formula from a recipe class object?

In the code block below, taken from the vignetted linked to above, the model formula is first specified in rec <- recipe(Sale_Price ~ Neighborhood + House_Style + Year_Sold + Lot_Area, data = ames) and then later respecified in map(bt_samples$recipes, fit_lm, Sale_Price ~ .) (the last line). It would be great to be able to pull the formula straight from the recipe class object and map to lm for fitting, instead of specifying the same thing twice. Thinking something like extract_formula(rec), but I cannot find the formula in the rec object as it is currently defined.

Any thoughts would be appreciated!

library(rsample)
library(recipes)
library(AmesHousing)
ames <- make_ames()
set.seed(7712)
bt_samples <- bootstraps(ames)

rec <- recipe(Sale_Price ~ Neighborhood + House_Style + Year_Sold + Lot_Area, 
              data = ames) %>%
  step_log(Sale_Price, base = 10) %>%
  step_other(Neighborhood, House_Style, threshold = 0.05) %>%
  step_dummy(all_nominal()) %>%
  step_BoxCox(Lot_Area) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors()) 

fit_lm <- function(rec_obj, ...) 
  lm(..., data = juice(rec_obj, everything()))

bt_samples$recipes <- map(bt_samples$splits, prepper, 
                          recipe = rec, retain = TRUE, verbose = FALSE)
bt_samples$lm_mod <-  map(bt_samples$recipes, fit_lm, Sale_Price ~ .)
1 Like

Keep in mind that the formula might change from resample-to-resample given the use of step_other (or any of the filter steps).

Here's some code that uses the summary method for recipes:

extract_formula <- function(x) {
  x <- summary(x)
  x_vars <- x$variable[x$role == "predictor"]
  y_vars <- x$variable[x$role == "outcome"]  
  
  x_vars <- paste0(x_vars, collapse = "+")
  y_vars <- paste0(y_vars, collapse = "+")
  
  as.formula(paste(y_vars, x_vars, sep = "~"))
}

This would need to be run on a fully prepared recipe.

On the first two recipes, I get:

> map(bt_samples$recipes[1:2], extract_formula)
[[1]]
Sale_Price ~ Year_Sold + Lot_Area + Neighborhood_College_Creek + 
    Neighborhood_Old_Town + Neighborhood_Edwards + Neighborhood_Somerset + 
    Neighborhood_Northridge_Heights + Neighborhood_Gilbert + 
    Neighborhood_Sawyer + Neighborhood_other + House_Style_One_Story + 
    House_Style_Two_Story + House_Style_other
<environment: 0x7fe610cf4520>

[[2]]
Sale_Price ~ Year_Sold + Lot_Area + Neighborhood_College_Creek + 
    Neighborhood_Old_Town + Neighborhood_Edwards + Neighborhood_Somerset + 
    Neighborhood_Northridge_Heights + Neighborhood_Gilbert + 
    Neighborhood_Sawyer + Neighborhood_other + House_Style_One_Story + 
    House_Style_Two_Story + House_Style_other
<environment: 0x7fe610cd3308>

I'm try to move away from formulas but I'll probably add this to the package (formula is a generic so it will probably be formula.recipe)

EDIT I have the best words.

3 Likes

Where can one read about @Max's tidy modeling approach?
Also, @Max, would this be one of the topics you are going to cover at the rstudio::conf 2018 workshop?

Where can one read about @Max’s tidy modeling approach?

For now, in the package vignettes for recipes, rsample, yardstick, and tidyposterior. There are still some fundamental bits that need to get worked out before I can really turn loose.

Also, @Max, would this be one of the topics you are going to cover at the rstudio::conf 2018 workshop?

Yes. It will be roughly half new stuff and some high-level api's from caret. There will be high-level functions for tidy modeling too but they aren't there yet.

2 Likes

Thank you @Max, very helpful!