Get absolute differences for all combinations of row values that belong to the same group

eimichae · January 20, 2021, 1:44pm

Hello,
Say I have a simple data frame like this (in reality my data frame is larger).

Group = c("d","d","d","t","t","t")
ID = c("d1","d2","d3","t1","t2","t3")
Value = c(15, 10,12,25,20, 22)
df<-data.frame(Group,ID,Value)

I want to find a way to calculated the absolute differences between all combinations of ID values ( excluding itself) within the same “Group”. These values should then be combined (in the column NewValues) into a new data frame that looks like this:

Group = c("d","d","d","t","t","t")
Diff= c("d1d2","d1d3","d2d3","t1t2","t1t3","t2t3")
NewValue=c(5,3,2,3,5,8)
dfNew<-data.frame(Group,Diff,NewValue)

Also, if that calculation cannot be done (i.e. group 5 is missing), NA values for the 'NewValue' column would be ideal.

I tried to use the group_by and mutate function of dplyr but without much success. Help would be very much appreciated!
Mike

FJCC · January 20, 2021, 3:36pm

I think you can start towards your goal using full_join() from the dplyr package. I do not understand how you want to handle the existence of two occurrences of t2.

Group = c("d","d","d","t","t","t")
ID = c("d1","d2","d3","t1","t2","t2")
Value = c(15, 10,12,25,20, 22)
df <- data.frame(Group,ID,Value)
 
library(dplyr)
 
full_join(df, df, by = c("Group")) %>% filter(ID.x < ID.y)
  Group ID.x Value.x ID.y Value.y
1     d   d1      15   d2      10
2     d   d1      15   d3      12
3     d   d2      10   d3      12
4     t   t1      25   t2      20
5     t   t1      25   t2      22

eimichae · January 20, 2021, 4:00pm

Hi,
thank you for your answer. The two occurences of t2 were a mistake (I corrected it, there is now t3).
The "full_join" approach is pretty neat as it provides me with dataframe listing all combinations. Thank you.
But what then? Somehow I have to calculate the absolute difference between each Value.x and Value.y combination and then create a "dfNew" frame as descirbed above.

FJCC · January 20, 2021, 4:18pm

Group = c("d","d","d","t","t","t")
ID = c("d1","d2","d3","t1","t2","t3")
Value = c(15, 10,12,25,20, 22)
df <- data.frame(Group,ID,Value)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

NewDF <- full_join(df, df, by = c("Group")) %>% filter(ID.x < ID.y) %>% 
  mutate(Diff = paste0(ID.x, ID.y), NewValue = abs(Value.x - Value.y))
NewDF
#>   Group ID.x Value.x ID.y Value.y Diff NewValue
#> 1     d   d1      15   d2      10 d1d2        5
#> 2     d   d1      15   d3      12 d1d3        3
#> 3     d   d2      10   d3      12 d2d3        2
#> 4     t   t1      25   t2      20 t1t2        5
#> 5     t   t1      25   t3      22 t1t3        3
#> 6     t   t2      20   t3      22 t2t3        2
NewDF <- NewDF %>% select(Group, Diff, NewValue)
NewDF
#>   Group Diff NewValue
#> 1     d d1d2        5
#> 2     d d1d3        3
#> 3     d d2d3        2
#> 4     t t1t2        5
#> 5     t t1t3        3
#> 6     t t2t3        2

^{Created on 2021-01-20 by the reprex package (v0.3.0)}

eimichae · January 21, 2021, 7:20am

Thank you very much! This is very helpful
I was not aware of the "full_join" command sofar.

I would have a follow up question out of curiosity:
What If I wan to find the all possible absolute combinations between the values that do NOT belong to the same group and calculate their differences (say in a data frame like the one below).

Group=c("c","c","c","s","s","s")
ID = c("d1c","d2c","d3c","d1s","d2s","d3s")
Value = c(15, 10,12,25,22, 30)
df<-data.frame(Group,ID,Value)

I struggled with the full_join approach above (mainly because I was not able to find a command for "everything but Group" (!="Group")

The goal would be to end up with somethig like this

Diff= c("d1cd1s","d1cd2s","d1cd3s","d2cd3s","d2cd2s","d3cd3s")
NewValue=c(10,7,15,20,12,18)
dfnew<-data.frame(Diff,NewValue)

FJCC · January 22, 2021, 4:56am

The key is to do a cross join by setting by = character(), which I had to look up in the help.

Group=c("c","c","c","s","s","s")
ID = c("d1c","d2c","d3c","d1s","d2s","d3s")
Value = c(15, 10,12,25,22, 30)
df<-data.frame(Group,ID,Value)
full_join(df, df, by = character()) %>% 
   mutate(Reduced.x = substr(ID.x, start = 2, stop = 2),
          Reduced.y = substr(ID.y, start = 2, stop = 2)) %>% 
   filter(Group.x == "c", Group.y == "s", Reduced.x <= Reduced.y) %>% 
   mutate(Diff = paste0(ID.x, ID.y), NewValue = abs(Value.x - Value.y)) %>%
   select(Diff, NewValue)

    Diff NewValue
1 d1cd1s       10
2 d1cd2s        7
3 d1cd3s       15
4 d2cd2s       12
5 d2cd3s       20
6 d3cd3s       18

eimichae · January 22, 2021, 2:40pm

Hi FJCC,
Once again. Thank you. This is brilliant.
I played around with your code. It seems that one could remove the whole command:

utate(Reduced.x = substr(ID.x, start = 2, stop = 2),
          Reduced.y = substr(ID.y, start = 2, stop = 2)) %>%

and it still would work. Or am I missing something? What is the reason for adding the "mutate" function?

FJCC · January 22, 2021, 2:55pm

That is a remnant of me trying one thing, changing my mind and not cleaning up the code.

system · January 29, 2021, 2:56pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.