I'll try and answer your second question, and then you can follow up if you need more clarification on .
vs .x
.
Here is reproducible code to build the models, once using .
and again using .x
. The short story is that they are equivalent in terms of the data that are used to build the models. I'll explain why below.
library(tidyr)
library(purrr)
library(dplyr, warn.conflicts = FALSE)
nested_cars <- mtcars %>%
as_tibble() %>%
group_by(cyl) %>%
nest()
z1 <- nested_cars %>%
mutate(
model = map(data, ~ lm(mpg ~ disp, data = .))
)
z2 <- nested_cars %>%
mutate(
model = map(data, ~ lm(mpg ~ disp, data = .x))
)
If you try and compare these two data frames directly, you hit an error. I'm not quite sure why this error occurs (it may be a dplyr bug), but it doesn't have to do with whether or not the models are different.
# this is a bug that has to do with the `data` column
all.equal(z1, z2)
#> Error: Can't join on 'data' x 'data' because of incompatible types (vctrs_list_of/vctrs_vctr / vctrs_list_of/vctrs_vctr)
Let's try comparing the list of models directly.
# just look at `model`
all.equal(z1$model, z2$model)
#> [1] "Component 1: Component 10: target, current do not match when deparsed"
#> [2] "Component 2: Component 10: target, current do not match when deparsed"
#> [3] "Component 3: Component 10: target, current do not match when deparsed"
Okay, so something is different between them. This is trying to tell us to "look at element 1 of the list, then look at element 10 of that". That is where we should find our differences. Let's take a look:
# these parts of the model are different
z1$model[[1]][[10]]
#> lm(formula = mpg ~ disp, data = .)
z2$model[[1]][[10]]
#> lm(formula = mpg ~ disp, data = .x)
Ah, see the data = .
vs data = .x
part? That comes from the way you specified the lm()
"call". all.equal()
is sensitive enough to discover that these are different.
But the models themselves were fit to the same data! We can validate this by looking at the coefficients of each model.
# extract out the coefficients of each model like this
map(z1$model, coefficients)
#> [[1]]
#> (Intercept) disp
#> 19.081987419 0.003605119
#>
#> [[2]]
#> (Intercept) disp
#> 40.8719553 -0.1351418
#>
#> [[3]]
#> (Intercept) disp
#> 22.03279891 -0.01963409
# these are equivalent
all.equal(
map(z1$model, coefficients),
map(z2$model, coefficients)
)
#> [1] TRUE
If you still aren't convinced, you can delete the call from each model (which is the only different part), and check again.
z1_model_no_call <- map(z1$model, ~.x[-10])
z2_model_no_call <- map(z2$model, ~.x[-10])
all.equal(z1_model_no_call, z2_model_no_call)
#> [1] TRUE