# boxCox() won't accept linear model created by map() despite being same class?

I have several models that I am trying to generate at once and then run separate boxCox() transformations. But every time I try to run boxCox(), I am getting a strange error.

I think it has to do with how the formula changes to .x when using map(). Here is a small reprex to demonstrate my error:

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(purrr)
library(broom)
library(car)
#>
#> Attaching package: 'car'
#> The following object is masked from 'package:purrr':
#>
#>     some
#> The following object is masked from 'package:dplyr':
#>
#>     recode

formulas <- c(mpg ~ wt,
sqrt(mpg) ~ wt,
sqrt(mpg) ~ sqrt(wt))

proj_lm <- tibble(formulas) %>%
mutate(lm_models = map(formulas, ~lm(formula = .x, data = mtcars)))

boxCox(proj_lm$lm_models[[1]]) #> Error in stats::model.frame(formula = .x, data = mtcars, drop.unused.levels = TRUE): object '.x' not found  Created on 2019-01-11 by the reprex package (v0.2.1) boxCox expects as its argument either a formula or a fitted model. If you look at what boxCox gets library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(purrr) library(broom) library(car) #> Loading required package: carData #> #> Attaching package: 'car' #> The following object is masked from 'package:purrr': #> #> some #> The following object is masked from 'package:dplyr': #> #> recode > formulas <- c(mpg ~ wt, + sqrt(mpg) ~ wt, + sqrt(mpg) ~ sqrt(wt)) > > proj_lm <- tibble(formulas) %>% > str(proj_lm) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 2 variables:$ formulas :List of 1
..$:Class 'formula' language mpg ~ wt .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>$ lm_models:List of 1
..$:List of 12 .. ..$ coefficients : Named num  37.29 -5.34
.. .. ..- attr(*, "names")= chr  "(Intercept)" "wt"
.. ..$residuals : Named num -2.28 -0.92 -2.09 1.3 -0.2 ... .. .. ..- attr(*, "names")= chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ... .. ..$ effects      : Named num  -113.65 -29.116 -1.661 1.631 0.111 ...
.. .. ..- attr(*, "names")= chr  "(Intercept)" "wt" "" "" ...
.. ..$rank : int 2 .. ..$ fitted.values: Named num  23.3 21.9 24.9 20.1 18.9 ...
.. .. ..- attr(*, "names")= chr  "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
.. ..$assign : int 0 1 .. ..$ qr           :List of 5
.. .. ..$qr : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ... .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. ..$ : chr  "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
.. .. .. .. ..$: chr "(Intercept)" "wt" .. .. .. ..- attr(*, "assign")= int 0 1 .. .. ..$ qraux: num  1.18 1.05
.. .. ..$pivot: int 1 2 .. .. ..$ tol  : num 1e-07
.. .. ..$rank : int 2 .. .. ..- attr(*, "class")= chr "qr" .. ..$ df.residual  : int 30
.. ..$xlevels : Named list() .. ..$ call         : language lm(formula = .x, data = mtcars)
.. ..$terms :Classes 'terms', 'formula' language mpg ~ wt .. .. .. ..- attr(*, "variables")= language list(mpg, wt) .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1 .. .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. .. ..$ : chr [1:2] "mpg" "wt"
.. .. .. .. .. ..$: chr "wt" .. .. .. ..- attr(*, "term.labels")= chr "wt" .. .. .. ..- attr(*, "order")= int 1 .. .. .. ..- attr(*, "intercept")= int 1 .. .. .. ..- attr(*, "response")= int 1 .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> .. .. .. ..- attr(*, "predvars")= language list(mpg, wt) .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric" .. .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt" .. ..$ model        :'data.frame':    32 obs. of  2 variables:
.. .. ..$mpg: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... .. .. ..$ wt : num  2.62 2.88 2.32 3.21 3.44 ...
.. .. ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ wt
.. .. .. .. ..- attr(*, "variables")= language list(mpg, wt)
.. .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. .. ..$: chr [1:2] "mpg" "wt" .. .. .. .. .. .. ..$ : chr "wt"
.. .. .. .. ..- attr(*, "term.labels")= chr "wt"
.. .. .. .. ..- attr(*, "order")= int 1
.. .. .. .. ..- attr(*, "intercept")= int 1
.. .. .. .. ..- attr(*, "response")= int 1
.. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. .. .. ..- attr(*, "predvars")= language list(mpg, wt)
.. .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
.. ..- attr(*, "class")= chr "lm"
>



and

str(proj_lm$lm_models[[1]])  is just the first of the three. proj_lm$lm_models[[1]] is a fitted lm model, it has silently substituted .x for mpg ~ wt in the first formula, and there's no way to retrieve that without digging in parallel with proj_lm$formulas[[1]], where mpg ~ wt is stashed. This suggests a tedious process of extracting proj_lm$lm_models[[1]]$call where we find lm(formula = .x, data = mtcars) and applying stringr to swap out .x for proj_lm$formulas[[1]] to insert mpg ~ wt in its place.

tibbles may contain vectors but they are not themselves vectors, which makes isolating the vectors you need to pass to another function seriatim a huge pain.

I'm too tired to attempt it, but I'd go for nested maps, one over fomulas to produce a vector of models and the inner one to apply boxCox.

2 Likes

That truly is a pain. Are there any resources describing this design implementation with purrr::map() or best-practice guidelines for things like this? Surely, I can't be the first person having this problem. Are there other implementations of boxCox that would work without explicitly needing the fitted model formula?

Last week, I was trying to map a function that worked perfectly on a single tibble to a nested tibble, which was a list with the group_by variable as one column and a list of tibbles as the second.

What I ended up with was two functions, one of which collect my analog of your formulas list and sent the output to the second, a simpler analogue of your lm_model tibble for my single tibble function to handle.

So, your first function maps lm to a list of formulas to return a list of models, and your second function maps the list of models to boxCox (I don't know of any version that will accept a \lambda style formula.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.