Calculate multiple t.test between two dataset

naifz22 · March 9, 2022, 8:38pm

First time posting here!

I'm having difficulty trying to mix the two dataset to calculate t.test:

library('tidyverse')

set.seed(123)
x <- data.frame( compounds=c( "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), 
                 sample1=sample( 1:6, replace=T, 10),
                 sample2=sample( 1:10, replace = T, 10), 
                 sample3=sample( 1:10, replace = T, 10), 
                 sample4=sample( 1:10, replace = T, 10),
                 sample5=sample( 1:10, replace = T, 10))

y <- data.frame( group1=sample( c(0,1), replace=T, 5),
                 group2=sample( c(0,1), replace=T, 5), 
                 group3=sample( c(0,1), replace=T, 5))
rownames(y) <- c( "sample1", "sample2", "sample3", "sample4", "sample5")

> x
   compounds sample1 sample2 sample3 sample4 sample5
1          a       3       6      10       7       2
2          b       6       9       7       9       5
3          c       3      10      10       9       8
4          d       2       5       9      10       2
5          e       2       3       3       7       1
6          f       6       9       4       5       9
7          g       3       9       1       7       9
8          h       5       9       7       5       6
9          i       4       3       5       6       5
10         j       6       8      10       9       9

> y
        group1 group2 group3
sample1      0      1      1
sample2      1      1      1
sample3      1      1      0
sample4      0      0      0
sample5      1      1      1

I want to calculate t.test with the numerical values in "x".
The grouping for t.test(a, b) are defined in "y" by 0 and 1.
For each grouping in "y" I want to calculate t.test for all compounds "a" to "j".

The column names in "x" are rownames in "y"

I can do this in a very lengthy way using forloops, but would love your help in doing this with tidyverse

Thank you!!!

pieterjanvc · March 10, 2022, 1:52am

HI,

Could you detail what the output should look like? Are you planning on doing 3 t tests per compound? I.e., a matrix with rows compounds, columns groups and values t-stat where the samples are split into a and b by y? If this is the case, that won't work for group 2 because there is only one 0 value, which would be a single values in the t-test comparison (not allowed).

Please provide some more details on what the output should be, preferably with one example.

PJ

naifz22 · March 10, 2022, 2:16am

@pieterjanvc that is correct. I want to perform 3 t.test per compound based on the groups in y. I want split the samples in "x" by the groups in "y" and do t.test. I know the t.test won't work for group 2, so it should result all "NA."

I want the output to look like this (note: this are all fictitious p.values):

> x
   compounds group1_p.value group2_p.value group3_p.value
1          a           0.06             NA           0.03
2          b           0.06             NA           0.05
3          c           0.09             NA           0.06
4          d           0.07             NA           0.03
5          e           0.01             NA           0.09
6          f           0.03             NA           0.04
7          g           0.07             NA           0.01
8          h           0.04             NA           0.02
9          i           0.08             NA           0.01
10         j           0.03             NA           0.08

pieterjanvc · March 10, 2022, 2:28am

OK in that case here you go:

library('tidyverse')

set.seed(123)
x <- data.frame( compounds=c( "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), 
                 sample1=sample( 1:6, replace=T, 10),
                 sample2=sample( 1:10, replace = T, 10), 
                 sample3=sample( 1:10, replace = T, 10), 
                 sample4=sample( 1:10, replace = T, 10),
                 sample5=sample( 1:10, replace = T, 10))

y <- data.frame( group1=sample( c(0,1), replace=T, 5),
                 group2=sample( c(0,1), replace=T, 5), 
                 group3=sample( c(0,1), replace=T, 5))
rownames(y) <- c( "sample1", "sample2", "sample3", "sample4", "sample5")

#For every group ...
z = apply(y, 2, function(group){
  
  #Make group True/False
  group = as.logical(group)
  
  #Check if no group is length 1 (can't do t-test)
  if(sum(group) == 1 | sum(!group) == 1){
    rep(NA, nrow(x))
  } else {
    #Do t test per compound per group
    apply(x[,-1], 1, function(compound){
      t.test(compound[group], compound[!group])$p.value
    })
  }
  
  
}) %>% as.data.frame() %>% 
  mutate(compound = x$compounds) %>% 
  select(compound, everything())

z
#>    compound    group1 group2     group3
#> 1         a 0.7657439     NA 0.11528406
#> 2         b 0.8147895     NA 0.45791385
#> 3         c 0.4609749     NA 0.35287187
#> 4         d 0.8990250     NA 0.01249512
#> 5         e 0.5413955     NA 0.36035825
#> 6         f 0.3886101     NA 0.05772363
#> 7         g 0.7160146     NA 0.49689883
#> 8         h 0.1180829     NA 0.69924825
#> 9         i 0.6374388     NA 0.14802788
#> 10        j 0.4907869     NA 0.17160903

^{Created on 2022-03-09 by the reprex package (v2.0.1.9000)}

Hope this helps,
PJ

naifz22 · March 10, 2022, 2:49am

WOW!!!
Thank you so much!
I was thinking of doing this without writing a function, but guess it's better this way.

pieterjanvc · March 10, 2022, 12:47pm

Hi,

Glad I could help. What I did create was not really a custom function, but the apply() I used did have a function within to perform by row or column. It's the same idea as writing a loop, but can be more efficient in certain cases.

You can learn more about apply() here if you like.

PJ

naifz22 · March 10, 2022, 7:16pm

Hi @pieterjanvc

I would like to add an additional column for each of my p-values calculated in "z." I can create a second object via:

zz <- apply( z, 2, function(pval){
    p.adjust(pval, method="fdr")
})

But I'm not sure how to merge the "z" and "zz" so that the output looks like this:

> z
  group1_pval group1_fdr group2_pval group2_fdr group3_pval group3_fdr
1           0          1           0          1           1          0
2           1          0           0          1           1          1
3           1          0           1          0           1          1
4           0          1           1          0           1          1
5           1          0           0          0           0          0

I have also tried:

z = apply(y, 2, function(group){
  
  #Make group True/False
  group = as.logical(group)
  
  #Check if no group is length 1 (can't do t-test)
  if(sum(group) == 1 | sum(!group) == 1){
    rep(NA, nrow(x))
  } else {
    #Do t test per compound per group
    p  <- apply(x[,-1], 1, function(compound){
      t.test(compound[group], compound[!group])$p.value
    })
  }
  q <- p.adjust( p, method="fdr")
#--- cbind like this does not work...
  cbind(p, q)
}) %>% as.data.frame() %>% 
  mutate(compound = x$compounds) %>% 
  select(compound, everything())

pieterjanvc · March 10, 2022, 8:37pm

HI,

You were close!

library('tidyverse')

set.seed(123)
x <- data.frame( compounds=c( "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), 
                 sample1=sample( 1:6, replace=T, 10),
                 sample2=sample( 1:10, replace = T, 10), 
                 sample3=sample( 1:10, replace = T, 10), 
                 sample4=sample( 1:10, replace = T, 10),
                 sample5=sample( 1:10, replace = T, 10))

y <- data.frame( group1=sample( c(0,1), replace=T, 5),
                 group2=sample( c(0,1), replace=T, 5), 
                 group3=sample( c(0,1), replace=T, 5))
rownames(y) <- c( "sample1", "sample2", "sample3", "sample4", "sample5")

#For every group ...
z = apply(y, 2, function(group){
  
  #Make group True/False
  group = as.logical(group)
  
  #Check if no group is length 1 (can't do t-test)
  if(sum(group) == 1 | sum(!group) == 1){
    rep(NA, nrow(x))
  } else {
    #Do t test per compound per group
    apply(x[,-1], 1, function(compound){
      t.test(compound[group], compound[!group])$p.value
    })
  }
  
  
}) %>% as.data.frame() %>% 
  mutate(compound = x$compounds) %>% 
  select(compound, everything())

#Caluclate fdr for all groups
fdr = apply(z[,-1], 2, function(pval){
  p.adjust(pval, method="fdr")
}) %>% as.data.frame()

#Update the column names
colnames(z)[-1] = paste(colnames(z)[-1], "pval", sep = "_")
colnames(fdr) = paste(colnames(fdr), "fdr", sep = "_")

#Merge and sort
z = cbind(z, fdr) 
z = z %>% select(sort(colnames(z)))
z
#>    compound group1_fdr group1_pval group2_fdr group2_pval group3_fdr
#> 1         a   0.899025   0.7657439         NA          NA  0.3432181
#> 2         b   0.899025   0.8147895         NA          NA  0.5521098
#> 3         c   0.899025   0.4609749         NA          NA  0.5147975
#> 4         d   0.899025   0.8990250         NA          NA  0.1249512
#> 5         e   0.899025   0.5413955         NA          NA  0.5147975
#> 6         f   0.899025   0.3886101         NA          NA  0.2886182
#> 7         g   0.899025   0.7160146         NA          NA  0.5521098
#> 8         h   0.899025   0.1180829         NA          NA  0.6992482
#> 9         i   0.899025   0.6374388         NA          NA  0.3432181
#> 10        j   0.899025   0.4907869         NA          NA  0.3432181
#>    group3_pval
#> 1   0.11528406
#> 2   0.45791385
#> 3   0.35287187
#> 4   0.01249512
#> 5   0.36035825
#> 6   0.05772363
#> 7   0.49689883
#> 8   0.69924825
#> 9   0.14802788
#> 10  0.17160903

^{Created on 2022-03-10 by the reprex package (v2.0.1.9000)}

PJ

naifz22 · March 10, 2022, 8:49pm

@priyankaigit Ahhh! select and sort.
Thanks!!!

system · March 17, 2022, 8:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.