Groupy by intersection of values in one column and second column is separated by underscores

Hello all,

I have a bit of a niche question for you all. Let's assume I have this dataframe:

df <- data.frame("X"=c("A","A","A","B","B","B","C","C","D","E","E"),"Y"=c(3,1,2,3,12,4,7,8,3,10,12))

X    Y
A    3
A    1
A    2
B    3
B    12
B    4
C    7
C    8
D    3
E    10
E    12

Now, what I want to do is find which values are shared. As we can see, value 3 in column Y is shared with A,B, and D in column X. Additionally, value 12 in column Y is shared with B and E in column X.

C in column X does not contain any shared values.

I would like to now create a new dataframe with their complete intersections and values, and entries in column X are separated by underscore.

The resulting dataframe I want looks like this:

new_X    new_Y
A_B_D_E    3
A_B_D_E    1
A_B_D_E    2
A_B_D_E    12
A_B_D_E    4
A_B_D_E    10
C    7
C    8

Thank you all in advance!

I'm not sure that I followed your expected output. What I understand from the description, it doesn't match with that. Certainly 3, 1, 2,12 and 4 is not shared between A, B D and E. Can you please check it?

From your description, I think this is what you're looking for:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- data.frame(X = c("A", "A", "A", "B", "B", "B", "C", "C", "D", "E", "E"),
                 Y = c(3, 1, 2, 3, 12, 4, 7, 8, 3, 10, 12))

df %>%
  group_by(Y) %>%
  mutate(Z = paste(X, collapse = "_")) %>%
  select(-X) %>%
  distinct()
#> # A tibble: 8 x 2
#> # Groups:   Y [8]
#>       Y Z    
#>   <dbl> <chr>
#> 1     3 A_B_D
#> 2     1 A    
#> 3     2 A    
#> 4    12 B_E  
#> 5     4 B    
#> 6     7 C    
#> 7     8 C    
#> 8    10 E

Created on 2019-04-03 by the reprex package (v0.2.1)

1 Like

Thank you for your response. I recognize that that 3, 1, 2,12 and 4 is not shared between A, B D and E. However, because A_B_D has a shared value, and B_E has a shared value, I would like to now have an outer joined result (for lack of a better term) with A_B_D_E and all of the values corresponding to them.

I hope this helps. I also realize this is a strange desired dataframe.

library(tidyverse)
df <- data.frame("X"=c("A","A","A","B","B","B","C","C","D","E","E"),
                 "Y"=c(3,1,2,3,12,4,7,8,3,10,12), 
                 stringsAsFactors = F)
df %>% 
    group_by(Y) %>% 
    mutate(temp1 = list(X)) %>% 
    ungroup() %>% 
    mutate(temp2 = list(Map(function(x) 
    {
        keep(temp1,  function(z) x %in% z) %>% unlist() 
    }, unique(X)))) %>% 
    mutate(Z = map_chr(temp1,  ~ keep(temp2[[1]], function(y) any(.x %in% y)) %>% 
                           unlist(use.names = F) %>% 
                           unique %>% 
                           paste(collapse = "_"))) %>% 
    select(-(temp1:temp2)) %>% 
    distinct(Y, .keep_all = TRUE)
#> # A tibble: 8 x 3
#>   X         Y Z      
#>   <chr> <dbl> <chr>  
#> 1 A         3 A_B_D_E
#> 2 A         1 A_B_D_E
#> 3 A         2 A_B_D_E
#> 4 B        12 A_B_D_E
#> 5 B         4 A_B_D_E
#> 6 C         7 C      
#> 7 C         8 C      
#> 8 E        10 A_B_D_E
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.