Filtering on everything but the current group

psimon · May 1, 2021, 8:50am

Hello,

I have a tibble with 2 columns: first one is a factor (6 possible values) and second one is a gene expression level (double). I would like to perform a t-test for each group. To do this, I need the expression level from the current group (x parameter in t.test) and the expression level from everything but the current group (y parameter in t.test).
Below is an example that works only for the first factor. I would like to change the filter line so that I get the proper y.
I had 2 ideas to do that:

Retrieve the name of the group. From what I could find on Stack Overflow this seems to be not possible.
Do a set difference between the whole dataset and the .x but I didn't succeed.

library(tidyverse)
library(broom)

# A sample of my data
data <- tribble (
  ~value, ~expr_level,
  "BCR", 0.564,
  "BCR", 0.841,
  "E2A", 0.214,
  "E2A", 0.147,
  "MLL", 0.451,
  "MLL", 0.411
)

data %>%
  group_by(value) %>% 
  group_map(~ t.test(x = .x %>% select(expr_level), 
                     y = data %>% 
                       filter(value != "BCR") %>% 
                       select(expr_level)) %>% 
              tidy() %>% 
              select(statistic))
#> [[1]]
#> # A tibble: 1 x 1
#>   statistic
#>       <dbl>
#> 1      2.53
#> 
#> [[2]]
#> # A tibble: 1 x 1
#>   statistic
#>       <dbl>
#> 1     -1.54
#> 
#> [[3]]
#> # A tibble: 1 x 1
#>   statistic
#>       <dbl>
#> 1      1.63

^{Created on 2021-05-01 by the reprex package (v2.0.0)}

Thanks for your input.

pieterjanvc · May 1, 2021, 1:11pm

Hi,

I know there's possibly a way of fully doing this with the Tidyverse, but sometimes I just like to combine base R functions with Tidyverse ones to get a clean and easy result

library(tidyverse)

# A sample of my data
data <- tribble (
  ~value, ~expr_level,
  "BCR", 0.564,
  "BCR", 0.841,
  "E2A", 0.214,
  "E2A", 0.147,
  "MLL", 0.451,
  "MLL", 0.411
)

#Map over all unique values
result = map_df(unique(data$value), function(gene){
  
  data.frame(
    gene = gene,
    tstatistic = t.test(
      data %>% filter(value == gene) %>% pull(expr_level),
      data %>% filter(value != gene) %>% pull(expr_level)
      )$statistic %>% as.numeric()
  )
  
  
})

result
#>   gene  tstatistic
#> 1  BCR  2.52624282
#> 2  E2A -3.76428272
#> 3  MLL -0.06451184

^{Created on 2021-05-01 by the reprex package (v2.0.0)}

I think this should be what you wanted right?

Hope this helps,
PJ

psimon · May 1, 2021, 5:12pm

Hi,

I was searching for some kind of tidyverse trick but I have to admit your answer looks clean and does the job,
thank you

system · May 8, 2021, 5:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.