boxCox() won't accept linear model created by map() despite being same class?

dylanjm · January 12, 2019, 2:04am

I have several models that I am trying to generate at once and then run separate boxCox() transformations. But every time I try to run boxCox(), I am getting a strange error.

I think it has to do with how the formula changes to .x when using map(). Here is a small reprex to demonstrate my error:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(broom)
library(car)
#> Loading required package: carData
#> 
#> Attaching package: 'car'
#> The following object is masked from 'package:purrr':
#> 
#>     some
#> The following object is masked from 'package:dplyr':
#> 
#>     recode

formulas <- c(mpg ~ wt,
              sqrt(mpg) ~ wt, 
              sqrt(mpg) ~ sqrt(wt))

proj_lm <- tibble(formulas) %>% 
  mutate(lm_models = map(formulas, ~lm(formula = .x, data = mtcars)))

boxCox(proj_lm$lm_models[[1]])
#> Error in stats::model.frame(formula = .x, data = mtcars, drop.unused.levels = TRUE): object '.x' not found

^{Created on 2019-01-11 by the reprex package (v0.2.1)}

technocrat · January 12, 2019, 7:36am

boxCox expects as its argument either a formula or a fitted model. If you look at what boxCox gets

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(broom)
library(car)
#> Loading required package: carData
#> 
#> Attaching package: 'car'
#> The following object is masked from 'package:purrr':
#> 
#>     some
#> The following object is masked from 'package:dplyr':
#> 
#>     recode
> formulas <- c(mpg ~ wt,
+               sqrt(mpg) ~ wt, 
+               sqrt(mpg) ~ sqrt(wt))
> 
> proj_lm <- tibble(formulas) %>% 
> str(proj_lm)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   1 obs. of  2 variables:
 $ formulas :List of 1
  ..$ :Class 'formula'  language mpg ~ wt
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
 $ lm_models:List of 1
  ..$ :List of 12
  .. ..$ coefficients : Named num  37.29 -5.34
  .. .. ..- attr(*, "names")= chr  "(Intercept)" "wt"
  .. ..$ residuals    : Named num  -2.28 -0.92 -2.09 1.3 -0.2 ...
  .. .. ..- attr(*, "names")= chr  "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
  .. ..$ effects      : Named num  -113.65 -29.116 -1.661 1.631 0.111 ...
  .. .. ..- attr(*, "names")= chr  "(Intercept)" "wt" "" "" ...
  .. ..$ rank         : int 2
  .. ..$ fitted.values: Named num  23.3 21.9 24.9 20.1 18.9 ...
  .. .. ..- attr(*, "names")= chr  "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
  .. ..$ assign       : int  0 1
  .. ..$ qr           :List of 5
  .. .. ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr  "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
  .. .. .. .. ..$ : chr  "(Intercept)" "wt"
  .. .. .. ..- attr(*, "assign")= int  0 1
  .. .. ..$ qraux: num  1.18 1.05
  .. .. ..$ pivot: int  1 2
  .. .. ..$ tol  : num 1e-07
  .. .. ..$ rank : int 2
  .. .. ..- attr(*, "class")= chr "qr"
  .. ..$ df.residual  : int 30
  .. ..$ xlevels      : Named list()
  .. ..$ call         : language lm(formula = .x, data = mtcars)
  .. ..$ terms        :Classes 'terms', 'formula'  language mpg ~ wt
  .. .. .. ..- attr(*, "variables")= language list(mpg, wt)
  .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:2] "mpg" "wt"
  .. .. .. .. .. ..$ : chr "wt"
  .. .. .. ..- attr(*, "term.labels")= chr "wt"
  .. .. .. ..- attr(*, "order")= int 1
  .. .. .. ..- attr(*, "intercept")= int 1
  .. .. .. ..- attr(*, "response")= int 1
  .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. .. ..- attr(*, "predvars")= language list(mpg, wt)
  .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
  .. ..$ model        :'data.frame':    32 obs. of  2 variables:
  .. .. ..$ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  .. .. ..$ wt : num  2.62 2.88 2.32 3.21 3.44 ...
  .. .. ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ wt
  .. .. .. .. ..- attr(*, "variables")= language list(mpg, wt)
  .. .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. .. ..$ : chr [1:2] "mpg" "wt"
  .. .. .. .. .. .. ..$ : chr "wt"
  .. .. .. .. ..- attr(*, "term.labels")= chr "wt"
  .. .. .. .. ..- attr(*, "order")= int 1
  .. .. .. .. ..- attr(*, "intercept")= int 1
  .. .. .. .. ..- attr(*, "response")= int 1
  .. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. .. .. ..- attr(*, "predvars")= language list(mpg, wt)
  .. .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
  .. ..- attr(*, "class")= chr "lm"
>

and

str(proj_lm$lm_models[[1]])

is just the first of the three. proj_lm$lm_models[[1]] is a fitted lm model, it has silently substituted .x for mpg ~ wt in the first formula, and there's no way to retrieve that without digging in parallel with proj_lm$formulas[[1]], where mpg ~ wt is stashed. This suggests a tedious process of extracting proj_lm$lm_models[[1]]$call where we find lm(formula = .x, data = mtcars) and applying stringr to swap out .x for proj_lm$formulas[[1]] to insert mpg ~ wt in its place.

tibbles may contain vectors but they are not themselves vectors, which makes isolating the vectors you need to pass to another function seriatim a huge pain.

I'm too tired to attempt it, but I'd go for nested maps, one over fomulas to produce a vector of models and the inner one to apply boxCox.

dylanjm · January 12, 2019, 5:10pm

That truly is a pain. Are there any resources describing this design implementation with purrr::map() or best-practice guidelines for things like this? Surely, I can't be the first person having this problem. Are there other implementations of boxCox that would work without explicitly needing the fitted model formula?

technocrat · January 12, 2019, 5:48pm

Last week, I was trying to map a function that worked perfectly on a single tibble to a nested tibble, which was a list with the group_by variable as one column and a list of tibbles as the second.

What I ended up with was two functions, one of which collect my analog of your formulas list and sent the output to the second, a simpler analogue of your lm_model tibble for my single tibble function to handle.

So, your first function maps lm to a list of formulas to return a list of models, and your second function maps the list of models to boxCox (I don't know of any version that will accept a \lambda style formula.

system · February 2, 2019, 5:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.