Extracting the bare model object from a fitted workflow

Hello!

I'm learning my way through the very nifty tidymodels ecosystem. I'm currently stumped with how to extract the bare fitted model object from a workflow after fitting. Specifically, I'm trying to do an anova on a lm regression model. pull_workflow_fit() gives me an object back with a model_fit and a _lm class, but not a bare lm object (that would be accepted by anova for instance.

I'm sure I'm just missing a step from my understanding of the tidymodels ecosystem. Appreciate the help. Reprex is below!

library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 0.1.2 ──
#> ✓ broom     0.7.2      ✓ recipes   0.1.15
#> ✓ dials     0.0.9      ✓ rsample   0.0.8 
#> ✓ dplyr     1.0.2      ✓ tibble    3.0.4 
#> ✓ ggplot2   3.3.2      ✓ tidyr     1.1.2 
#> ✓ infer     0.5.3      ✓ tune      0.1.2 
#> ✓ modeldata 0.1.0      ✓ workflows 0.2.1 
#> ✓ parsnip   0.1.4      ✓ yardstick 0.0.7 
#> ✓ purrr     0.3.4
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
lm_mod <- linear_reg() %>% 
  set_mode("regression") %>% 
  set_engine("lm")
recipe1 <- recipe(mpg ~ cyl, data = mtcars)
recipe2 <- recipe(mpg ~ cyl + hp, data = mtcars)
wf <- workflow() %>% 
  add_recipe(recipe1) %>% 
  add_model(lm_mod)
model1 <- wf %>% fit(mtcars) %>% pull_workflow_fit()
model2 <- wf %>% update_recipe(recipe2) %>% fit(mtcars) %>% pull_workflow_fit() 
anova(model1, model2)
#> Error in UseMethod("anova"): no applicable method for 'anova' applied to an object of class "c('_lm', 'model_fit')"

Hi @davidski,

The model object you get when you use pull_workflow_fit() is a thin wrapper around the bare model object. You can extract the bare object by subsetting out the fit item from the model object list.

library(tidymodels)

lm_mod <- linear_reg() %>% 
  set_mode("regression") %>% 
  set_engine("lm")

recipe1 <- recipe(mpg ~ cyl, data = mtcars)

recipe2 <- recipe(mpg ~ cyl + hp, data = mtcars)

wf <- workflow() %>% 
  add_recipe(recipe1) %>% 
  add_model(lm_mod)

model1 <- wf %>% 
  fit(mtcars) %>% 
  pull_workflow_fit()

model2 <- wf %>% 
  update_recipe(recipe2) %>% 
  fit(mtcars) %>% 
  pull_workflow_fit() 

class(model1)
#> [1] "_lm"       "model_fit"

class(model1$fit)
#> [1] "lm"

anova(model1$fit, model2$fit)
#> Analysis of Variance Table
#> 
#> Model 1: ..y ~ cyl
#> Model 2: ..y ~ cyl + hp
#>   Res.Df    RSS Df Sum of Sq      F Pr(>F)
#> 1     30 308.33                           
#>  [ reached getOption("max.print") -- omitted 1 row ]
1 Like

Ah ha! I knew it was going to be something simple. Thanks so much!

I suppose I could have inspected the object structure to determine that. Is there a place in the docs where the presence of the $fit is documented? I obviously missed it and would like to understand what else I'm missing!

Hmm, not sure where that would be documented, if at all. I may have just figured it out the hard way by introspecting the parsnip model object.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.