Different behavior between . and .x with list columns

dkane · November 4, 2019, 11:11pm

I am confused about the difference between using .x and . with a map function and list columns. Example:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(purrr)
mtcars %>% 
  as_tibble() %>% 
  group_by(cyl) %>% 
  nest() %>% 
  mutate(model = map(data, ~ lm(mpg ~ disp, data = .))) -> z1

mtcars %>% 
  as_tibble() %>% 
  group_by(cyl) %>% 
  nest() %>% 
  mutate(model = map(data, ~ lm(mpg ~ disp, data = .x))) -> z2

all.equal(z1, z2)
#> Error: Can't join on 'data' x 'data' because of incompatible types (vctrs_list_of/vctrs_vctr / vctrs_list_of/vctrs_vctr)

^{Created on 2019-11-04 by the reprex package (v0.3.0)}

Is there some reason to prefer the use of . or of .x? I am interested in following best practices.
I believe that these objects are close to the same, which is what visual inspection suggests. But there must be some difference. What is it? Should I care?

mishabalyasin · November 5, 2019, 1:58pm

There might be some difference, but your reprex doesn't show it. Last line errors out, it doesn't say that they are different. So how did you decide on that?

davis · November 5, 2019, 2:14pm

I'll try and answer your second question, and then you can follow up if you need more clarification on . vs .x.

Here is reproducible code to build the models, once using . and again using .x. The short story is that they are equivalent in terms of the data that are used to build the models. I'll explain why below.

library(tidyr)
library(purrr)
library(dplyr, warn.conflicts = FALSE)

nested_cars <- mtcars %>% 
  as_tibble() %>% 
  group_by(cyl) %>% 
  nest() 

z1 <- nested_cars %>%
  mutate(
    model = map(data, ~ lm(mpg ~ disp, data = .))
  )

z2 <- nested_cars %>%
  mutate(
    model = map(data, ~ lm(mpg ~ disp, data = .x))
  )

If you try and compare these two data frames directly, you hit an error. I'm not quite sure why this error occurs (it may be a dplyr bug), but it doesn't have to do with whether or not the models are different.

# this is a bug that has to do with the `data` column
all.equal(z1, z2)
#> Error: Can't join on 'data' x 'data' because of incompatible types (vctrs_list_of/vctrs_vctr / vctrs_list_of/vctrs_vctr)

Let's try comparing the list of models directly.

# just look at `model`
all.equal(z1$model, z2$model)
#> [1] "Component 1: Component 10: target, current do not match when deparsed"
#> [2] "Component 2: Component 10: target, current do not match when deparsed"
#> [3] "Component 3: Component 10: target, current do not match when deparsed"

Okay, so something is different between them. This is trying to tell us to "look at element 1 of the list, then look at element 10 of that". That is where we should find our differences. Let's take a look:

# these parts of the model are different
z1$model[[1]][[10]]
#> lm(formula = mpg ~ disp, data = .)

z2$model[[1]][[10]]
#> lm(formula = mpg ~ disp, data = .x)

Ah, see the data = . vs data = .x part? That comes from the way you specified the lm() "call". all.equal() is sensitive enough to discover that these are different.

But the models themselves were fit to the same data! We can validate this by looking at the coefficients of each model.

# extract out the coefficients of each model like this
map(z1$model, coefficients)
#> [[1]]
#>  (Intercept)         disp 
#> 19.081987419  0.003605119 
#> 
#> [[2]]
#> (Intercept)        disp 
#>  40.8719553  -0.1351418 
#> 
#> [[3]]
#> (Intercept)        disp 
#> 22.03279891 -0.01963409

# these are equivalent
all.equal(
  map(z1$model, coefficients), 
  map(z2$model, coefficients)
)
#> [1] TRUE

If you still aren't convinced, you can delete the call from each model (which is the only different part), and check again.

z1_model_no_call <- map(z1$model, ~.x[-10])
z2_model_no_call <- map(z2$model, ~.x[-10])

all.equal(z1_model_no_call, z2_model_no_call)
#> [1] TRUE

dkane · November 5, 2019, 2:18pm

Thanks! That us very helpful.

mishabalyasin · November 5, 2019, 9:09pm

The only thing is that all.equal is base-R, dplyr has all_equal. This is only a nit given that using all_equal doesn't change much/at all:

> dplyr::all_equal(z1, z2)
Error: Can't join on 'data' x 'data' because of incompatible types (vctrs_list_of/vctrs_vctr / vctrs_list_of/vctrs_vctr)

davis · November 5, 2019, 9:20pm

dplyr registers dplyr:::all.equal.tbl_df()

mishabalyasin · November 5, 2019, 9:32pm

Hm, didn't know that.
Messed around some more and it does look like a bug in dplyr since following works:

> all.equal(as.data.frame(z1), z2)
 [1] "Attributes: < Names: 1 string mismatch >"                                                
 [2] "Attributes: < Length mismatch: comparison on first 2 components >"                       
 [3] "Attributes: < Component “class”: Lengths (1, 4) differ (string compare on first 1) >"    
 [4] "Attributes: < Component “class”: 1 string mismatch >"                                    
 [5] "Attributes: < Component 2: Modes: numeric, list >"                                       
 [6] "Attributes: < Component 2: Lengths: 3, 2 >"                                              
 [7] "Attributes: < Component 2: names for current but not for target >"                       
 [8] "Attributes: < Component 2: Attributes: < target is NULL, current is list > >"            
 [9] "Attributes: < Component 2: target is numeric, current is tbl_df >"                       
[10] "Component “model”: Component 1: Component 10: target, current do not match when deparsed"
[11] "Component “model”: Component 2: Component 10: target, current do not match when deparsed"
[12] "Component “model”: Component 3: Component 10: target, current do not match when deparsed"

Andrzej · November 6, 2019, 6:27pm

Hi All,
I assume that Author's of this topic intention was to check (compare) two data frames ?
At least this is what class(z1) and class(z2) says:

mtcars %>% 
  as_tibble() %>% 
  group_by(cyl) %>% 
  nest() %>% 
  mutate(model = map(data, ~ lm(mpg ~ disp, data = .))) -> z1

mtcars %>% 
  as_tibble() %>% 
  group_by(cyl) %>% 
  nest() %>% 
  mutate(model = map(data, ~ lm(mpg ~ disp, data = .x))) -> z2

class(z1)
[1] "grouped_df" "tbl_df" "tbl" "data.frame"

class(z2)
[1] "data.table" "data.frame"

So both of them are data frames.

My question is why can't it be compared in a simple way ?

The code above is a bit long (@davis) and the only function which seems to be working is: all.equal(as.data.frame(z1), z2). I don't know why only (z1) is placed in brackets (@mishabalyasin reply), however this is working as well: all.equal(as.data.frame(z1), (z2)).
This command: dplyr:::all.equal.tbl_df(z1, z2) - it does not work as well, and this: all.equal(as.data.frame(z1), as.data.frame(z2)), is working too. Also, there are many methods to compare data frames in R, for example.:
https://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in, none of these is working in order to compare z1 and z2.

mishabalyasin · November 6, 2019, 6:54pm

My snippet works because S3 dispatch usually works on a first argument. If it is tbl_df, it will use dplyr:::all.equal.tbl_df, if it's only a data.frame, it will use base-r all.equal. That's why I'm only converting the first argument and that's why all.equal(as.data.frame(z1), as.data.frame(z2)) works as well.

Andrzej · November 6, 2019, 10:07pm

Thank you @mishabalyasin,

Additional question: is it an "explicit" way to check what is hiding behind . and .x ?
Sometimes this purrr syntax is a bit confusing for a beginner like me, especially in multi-pipe-steps
operations.
Any explanations with examples would be greatly appreciated.

system · November 27, 2019, 10:16pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.