Data_grid .model parameter


#1

Hi everyone,
I was asked to provide a reprex and make it available here:

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
library(hexbin)
library(modelr)
smaller <- diamonds %>% filter(carat < 3)
mod_diamond2 <- lm(price ~ carat + color + cut + clarity, data = smaller)
grid <- smaller %>% data_grid(cut, .model = mod_diamond2) %>% 
  add_predictions(mod_diamond2)
#> Error in overscope_eval_next(overscope, expr): Objekt 'G' not found

This example worked fine running an older version, but now I get the error above (RStudio 1.0.153 and R 3.4.2 and tidyverse_update), It is possible to make a workaround by calculating typcial() manually before calling data_grid().

Thanks for any advice.

Greets


#3

Let’s pick this apart. :nerd: data_grid() is basically a few calls to tidyr::crossing().

data_grid <- function (data, ..., .model = NULL) {
  expanded <- tidyr::expand(data, ...)
  if (is.null(.model)) 
    return(expanded)
  needed <- setdiff(predictor_vars(.model), names(expanded))
  typical <- tidyr::crossing_(lapply(data[needed], typical))
  tidyr::crossing(expanded, typical)
}

crossing() takes named vectors and generates all crossed combinations of them.

library(tidyverse)
crossing(x = 1:3, y = 1:2)
#> # A tibble: 6 x 2
#>       x     y
#>   <int> <int>
#> 1     1     1
#> 2     1     2
#> 3     2     1
#> 4     2     2
#> 5     3     1
#> 6     3     2

The error arises in the line with crossing_(). Back in the day, say 2015-2016 :smile: , tidyverse dataframe functions, like select(), filter() or here crossing(), had variants that were geared for programming in packages, and they all ended in _(). These are being phased out as part of the adoption of “tidy evaluation” framework. So… it looks like this error is a bug in the migration from lazyeval to tidyeval. We can fix this by converting the line to use the new framework.

tidyr::crossing() works on a bunch of named vectors. lapply(data[needed], typical) applies typical() to each column in data[needed] storing the results in a list. Here’s an example on the iris data-set.

lapply(iris, typical)
#> $Sepal.Length
#> [1] 5.8
#> 
#> $Sepal.Width
#> [1] 3
#> 
#> $Petal.Length
#> [1] 4.35
#> 
#> $Petal.Width
#> [1] 1.3
#> 
#> $Species
#> [1] "setosa"     "versicolor" "virginica" 

We need to convert this list into a bunch of named vectors for crossing(). In the new tidyeval framework, we can splice a list using !!!. We are kinda popping open a list bubble and spilling each of its vectors into crossing(...).

crossing(!!! lapply(iris, typical))
#> # A tibble: 3 x 5
#>  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#>          <dbl>       <dbl>        <dbl>       <dbl>      <chr>
#> 1          5.8           3         4.35         1.3     setosa
#> 2          5.8           3         4.35         1.3 versicolor
#> 3          5.8           3         4.35         1.3  virginica

Thus to fix the function, we fix that one line.

data_grid2 <- function (data, ..., .model = NULL) {
  expanded <- tidyr::expand(data, ...)
  if (is.null(.model)) 
    return(expanded)
  needed <- setdiff(modelr:::predictor_vars(.model), names(expanded))
  typical <- tidyr::crossing(!!! lapply(data[needed], typical))
  tidyr::crossing(expanded, typical)
}

smaller %>% 
  data_grid2(cut, .model = mod_diamond2) %>% 
  add_predictions(mod_diamond2)
#> # A tibble: 5 x 5
#>         cut carat color clarity     pred
#>       <ord> <dbl> <chr>   <chr>    <dbl>
#> 1      Fair   0.7     G     SI1 1923.252
#> 2      Good   0.7     G     SI1 2563.082
#> 3 Very Good   0.7     G     SI1 2756.650
#> 4   Premium   0.7     G     SI1 2776.789
#> 5     Ideal   0.7     G     SI1 2909.394

Alternatively, if we don’t want to use !!!, we might change how the crossing() function works instead, using a lift_ adverb from purrr. Specifically, lift_dl() lifts a function from working on dots d to working on a list l instead.

purrr::lift_dl(crossing)(lapply(iris, typical))
#> # A tibble: 3 x 5
#>  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#>          <dbl>       <dbl>        <dbl>       <dbl>      <chr>
#> 1          5.8           3         4.35         1.3     setosa
#> 2          5.8           3         4.35         1.3 versicolor
#> 3          5.8           3         4.35         1.3  virginica