Use of mutate and %in% with list-columns

I don't understand how %in% works with list-columns. Example:

library(tidyverse)
tibble(x = list(c(1, 2, 3))) %>% 
  mutate(matches = x %in% c(2, 3) %>% sum())
#> # A tibble: 1 x 2
#>   x         matches
#>   <list>      <int>
#> 1 <dbl [3]>       0

Created on 2019-10-07 by the reprex package (v0.3.0)

I would have expected matches to equal 2. After all:

library(tidyverse)
z <- tibble(x = list(c(1, 2, 3)))
z$x[[1]] %in% c(2, 3) %>% sum()
#> [1] 2

Created on 2019-10-07 by the reprex package (v0.3.0)

I never use list-columns but it seems to me that x is still a list when used inside the mutate and you want %in% to compare the first element of x to c(2,3).

library(tibble)
library(dplyr)
DF <- tibble(x = list(c(1, 2, 3))) 
DF
#> # A tibble: 1 x 1
#>   x        
#>   <list>   
#> 1 <dbl [3]>
DF %>% mutate(matches = x[[1]] %in% c(2, 3) %>% sum())
#> # A tibble: 1 x 2
#>   x         matches
#>   <list>      <int>
#> 1 <dbl [3]>       2

Created on 2019-10-07 by the reprex package (v0.2.1)

Thanks for taking a look at my question.

But that works because you have hard coded x[[1]]. If DF has 2 rows or more, your code won't work. Or, rather, it will just give us matches = 2 in every row.

library(tibble)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
DF <- tibble(x = list(c(1, 2, 3), c(2, 4, 5))) 
DF %>% mutate(matches = x[[1]] %in% c(2, 3) %>% sum())
#> # A tibble: 2 x 2
#>   x         matches
#>   <list>      <int>
#> 1 <dbl [3]>       2
#> 2 <dbl [3]>       2

Created on 2019-10-07 by the reprex package (v0.3.0)

If you have multiple rows and you want a result for each, it seems like a job for summarize() with an additional column to distinguish the rows.

library(tibble)
library(dplyr)

DF <- tibble(Name = c("A", "B"), 
             x = list(c(1, 2, 3), c(2, 4, 5))
             ) 
DF
#> # A tibble: 2 x 2
#>   Name  x        
#>   <chr> <list>   
#> 1 A     <dbl [3]>
#> 2 B     <dbl [3]>
DF %>% group_by(Name) %>% summarize(matches = x[[1]] %in% c(2, 3) %>% sum())
#> # A tibble: 2 x 2
#>   Name  matches
#>   <chr>   <int>
#> 1 A           2
#> 2 B           1

Created on 2019-10-07 by the reprex package (v0.2.1)

I'm going to take away the sum() and then the tidyverse trappings to try to see what this looks like "under-the-hood" to %in%. The summary point here being that list(c(1, 2, 3)) %in% c(2, 3) returns FALSE, while numeric vector inside that list returns the results for the individual components.

Not sure if this clears things up, but I find it helpful to break things down a bit.

library(tidyverse)
tibble(x = list(c(1, 2, 3))) %>% 
  mutate(matches = x %in% c(2, 3))
#> # A tibble: 1 x 2
#>   x         matches
#>   <list>    <lgl>  
#> 1 <dbl [3]> FALSE

list(c(1, 2, 3)) %in% c(2, 3)
#> [1] FALSE

x <- list(c(1, 2, 3))

x %in% c(2, 3)
#> [1] FALSE

y <- x[[1]]

class(x)
#> [1] "list"

class(y)
#> [1] "numeric"

y %in% c(2, 3)
#> [1] FALSE  TRUE  TRUE

Created on 2019-10-08 by the reprex package (v0.3.0)

2 Likes

Mara,

Thanks! I now understand that my confusion has nothing to do with list-columns. Instead, I (mistakenly!) thought that %in% would work the same with lists as it does with vectors. Thanks for showing that it does not.

1 Like

I also learned that this morning! :wink:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.