Doing summary with a pairwise table

MurielGB · March 9, 2020, 1:15pm

Hi everyone!

I have a table of that sort:

ID1 ID2 relatedness
A B 0.5
A C 0.8
B C 0.7

I would like to make a summary for each ID (A, B, C) of the relatedness values.
I was of course thinking about using summarise function, to get min, max, average etc. for each accession.

The issue is that if I group_by ID1, then for A, I would indeed have the two values of relatedness used for summarizing.
But for B, only the last value of the relatedness from the above table would be used, because it's other value (with A) is on the first line.

What is the way to tell to group by both ID1 and ID2 so that for calculating the summary of relatedness for each accessions, it would check in both ID1 and ID2 columns?

Thanks!

Muriel

andresrcs · March 9, 2020, 3:20pm

This is what I understand from your explanation

library(tidyverse)

sample_df <- data.frame(ID1 = c("A", "A", "B"),
                        ID2 = c("B", "C", "C"),
                        relatedness = c(0.5, 0.8, 0.7))

sample_df %>% 
    gather(ID_num, Value, -relatedness) %>% 
    group_by(Value) %>% 
    summarise(sum_relatednes = sum(relatedness))
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#> # A tibble: 3 x 2
#>   Value sum_relatednes
#>   <chr>          <dbl>
#> 1 A                1.3
#> 2 B                1.2
#> 3 C                1.5

^{Created on 2020-03-09 by the reprex package (v0.3.0.9001)}

MurielGB · March 10, 2020, 6:07am

Thanks for your reply!
@andresrcs got how to do!

MurielGB · March 10, 2020, 6:22am

Thanks very much @andresrcs, that's it!

system · March 17, 2020, 6:25am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.