Calculate multiple t.test between two dataset

First time posting here!

I'm having difficulty trying to mix the two dataset to calculate t.test:

library('tidyverse')

set.seed(123)
x <- data.frame( compounds=c( "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), 
                 sample1=sample( 1:6, replace=T, 10),
                 sample2=sample( 1:10, replace = T, 10), 
                 sample3=sample( 1:10, replace = T, 10), 
                 sample4=sample( 1:10, replace = T, 10),
                 sample5=sample( 1:10, replace = T, 10))

y <- data.frame( group1=sample( c(0,1), replace=T, 5),
                 group2=sample( c(0,1), replace=T, 5), 
                 group3=sample( c(0,1), replace=T, 5))
rownames(y) <- c( "sample1", "sample2", "sample3", "sample4", "sample5")

> x
   compounds sample1 sample2 sample3 sample4 sample5
1          a       3       6      10       7       2
2          b       6       9       7       9       5
3          c       3      10      10       9       8
4          d       2       5       9      10       2
5          e       2       3       3       7       1
6          f       6       9       4       5       9
7          g       3       9       1       7       9
8          h       5       9       7       5       6
9          i       4       3       5       6       5
10         j       6       8      10       9       9

> y
        group1 group2 group3
sample1      0      1      1
sample2      1      1      1
sample3      1      1      0
sample4      0      0      0
sample5      1      1      1

I want to calculate t.test with the numerical values in "x".
The grouping for t.test(a, b) are defined in "y" by 0 and 1.
For each grouping in "y" I want to calculate t.test for all compounds "a" to "j".

The column names in "x" are rownames in "y"

I can do this in a very lengthy way using forloops, but would love your help in doing this with tidyverse

Thank you!!!

HI,

Could you detail what the output should look like? Are you planning on doing 3 t tests per compound? I.e., a matrix with rows compounds, columns groups and values t-stat where the samples are split into a and b by y? If this is the case, that won't work for group 2 because there is only one 0 value, which would be a single values in the t-test comparison (not allowed).

Please provide some more details on what the output should be, preferably with one example.

PJ

@pieterjanvc that is correct. I want to perform 3 t.test per compound based on the groups in y. I want split the samples in "x" by the groups in "y" and do t.test. I know the t.test won't work for group 2, so it should result all "NA."

I want the output to look like this (note: this are all fictitious p.values):

> x
   compounds group1_p.value group2_p.value group3_p.value
1          a           0.06             NA           0.03
2          b           0.06             NA           0.05
3          c           0.09             NA           0.06
4          d           0.07             NA           0.03
5          e           0.01             NA           0.09
6          f           0.03             NA           0.04
7          g           0.07             NA           0.01
8          h           0.04             NA           0.02
9          i           0.08             NA           0.01
10         j           0.03             NA           0.08

OK in that case here you go:

library('tidyverse')

set.seed(123)
x <- data.frame( compounds=c( "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), 
                 sample1=sample( 1:6, replace=T, 10),
                 sample2=sample( 1:10, replace = T, 10), 
                 sample3=sample( 1:10, replace = T, 10), 
                 sample4=sample( 1:10, replace = T, 10),
                 sample5=sample( 1:10, replace = T, 10))

y <- data.frame( group1=sample( c(0,1), replace=T, 5),
                 group2=sample( c(0,1), replace=T, 5), 
                 group3=sample( c(0,1), replace=T, 5))
rownames(y) <- c( "sample1", "sample2", "sample3", "sample4", "sample5")

#For every group ...
z = apply(y, 2, function(group){
  
  #Make group True/False
  group = as.logical(group)
  
  #Check if no group is length 1 (can't do t-test)
  if(sum(group) == 1 | sum(!group) == 1){
    rep(NA, nrow(x))
  } else {
    #Do t test per compound per group
    apply(x[,-1], 1, function(compound){
      t.test(compound[group], compound[!group])$p.value
    })
  }
  
  
}) %>% as.data.frame() %>% 
  mutate(compound = x$compounds) %>% 
  select(compound, everything())

z
#>    compound    group1 group2     group3
#> 1         a 0.7657439     NA 0.11528406
#> 2         b 0.8147895     NA 0.45791385
#> 3         c 0.4609749     NA 0.35287187
#> 4         d 0.8990250     NA 0.01249512
#> 5         e 0.5413955     NA 0.36035825
#> 6         f 0.3886101     NA 0.05772363
#> 7         g 0.7160146     NA 0.49689883
#> 8         h 0.1180829     NA 0.69924825
#> 9         i 0.6374388     NA 0.14802788
#> 10        j 0.4907869     NA 0.17160903

Created on 2022-03-09 by the reprex package (v2.0.1.9000)

Hope this helps,
PJ

1 Like

WOW!!!
Thank you so much!
I was thinking of doing this without writing a function, but guess it's better this way.

1 Like

Hi,

Glad I could help. What I did create was not really a custom function, but the apply() I used did have a function within to perform by row or column. It's the same idea as writing a loop, but can be more efficient in certain cases.

You can learn more about apply() here if you like.

PJ

1 Like

Hi @pieterjanvc

I would like to add an additional column for each of my p-values calculated in "z." I can create a second object via:

zz <- apply( z, 2, function(pval){
    p.adjust(pval, method="fdr")
})

But I'm not sure how to merge the "z" and "zz" so that the output looks like this:

> z
  group1_pval group1_fdr group2_pval group2_fdr group3_pval group3_fdr
1           0          1           0          1           1          0
2           1          0           0          1           1          1
3           1          0           1          0           1          1
4           0          1           1          0           1          1
5           1          0           0          0           0          0

I have also tried:

z = apply(y, 2, function(group){
  
  #Make group True/False
  group = as.logical(group)
  
  #Check if no group is length 1 (can't do t-test)
  if(sum(group) == 1 | sum(!group) == 1){
    rep(NA, nrow(x))
  } else {
    #Do t test per compound per group
    p  <- apply(x[,-1], 1, function(compound){
      t.test(compound[group], compound[!group])$p.value
    })
  }
  q <- p.adjust( p, method="fdr")
#--- cbind like this does not work...
  cbind(p, q)
}) %>% as.data.frame() %>% 
  mutate(compound = x$compounds) %>% 
  select(compound, everything())

HI,

You were close!

library('tidyverse')

set.seed(123)
x <- data.frame( compounds=c( "a", "b", "c", "d", "e", "f", "g", "h", "i", "j"), 
                 sample1=sample( 1:6, replace=T, 10),
                 sample2=sample( 1:10, replace = T, 10), 
                 sample3=sample( 1:10, replace = T, 10), 
                 sample4=sample( 1:10, replace = T, 10),
                 sample5=sample( 1:10, replace = T, 10))

y <- data.frame( group1=sample( c(0,1), replace=T, 5),
                 group2=sample( c(0,1), replace=T, 5), 
                 group3=sample( c(0,1), replace=T, 5))
rownames(y) <- c( "sample1", "sample2", "sample3", "sample4", "sample5")

#For every group ...
z = apply(y, 2, function(group){
  
  #Make group True/False
  group = as.logical(group)
  
  #Check if no group is length 1 (can't do t-test)
  if(sum(group) == 1 | sum(!group) == 1){
    rep(NA, nrow(x))
  } else {
    #Do t test per compound per group
    apply(x[,-1], 1, function(compound){
      t.test(compound[group], compound[!group])$p.value
    })
  }
  
  
}) %>% as.data.frame() %>% 
  mutate(compound = x$compounds) %>% 
  select(compound, everything())

#Caluclate fdr for all groups
fdr = apply(z[,-1], 2, function(pval){
  p.adjust(pval, method="fdr")
}) %>% as.data.frame()

#Update the column names
colnames(z)[-1] = paste(colnames(z)[-1], "pval", sep = "_")
colnames(fdr) = paste(colnames(fdr), "fdr", sep = "_")

#Merge and sort
z = cbind(z, fdr) 
z = z %>% select(sort(colnames(z)))
z
#>    compound group1_fdr group1_pval group2_fdr group2_pval group3_fdr
#> 1         a   0.899025   0.7657439         NA          NA  0.3432181
#> 2         b   0.899025   0.8147895         NA          NA  0.5521098
#> 3         c   0.899025   0.4609749         NA          NA  0.5147975
#> 4         d   0.899025   0.8990250         NA          NA  0.1249512
#> 5         e   0.899025   0.5413955         NA          NA  0.5147975
#> 6         f   0.899025   0.3886101         NA          NA  0.2886182
#> 7         g   0.899025   0.7160146         NA          NA  0.5521098
#> 8         h   0.899025   0.1180829         NA          NA  0.6992482
#> 9         i   0.899025   0.6374388         NA          NA  0.3432181
#> 10        j   0.899025   0.4907869         NA          NA  0.3432181
#>    group3_pval
#> 1   0.11528406
#> 2   0.45791385
#> 3   0.35287187
#> 4   0.01249512
#> 5   0.36035825
#> 6   0.05772363
#> 7   0.49689883
#> 8   0.69924825
#> 9   0.14802788
#> 10  0.17160903

Created on 2022-03-10 by the reprex package (v2.0.1.9000)

PJ

1 Like

@priyankaigit Ahhh! select and sort.
Thanks!!!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.