How to connect variables over columns for analysis

Hello Guys,

I am quite a Newbee for rstudio and using it for my thesis as a M.D. and I learnt R primarily via YT. I tried to search for an answer several days - but I am not really sure what the specific terminus is...

The Problem:
Is there a way, to connect variables over several columns for t-Test or Wilcoxin-Test? Can I add it into the t.test() command?
I have one big spreadsheet with parameters (the coloumns) like Genotype, plasmalevels and medications and I want to run some statistical tests. So for e.g. I want to compare specimen from Genotyp "A", who got the medication, with specimen who did not - but I have more than one Genotype listed, medication yes/no and so on. The t.test command now simply compares e.g. Genotyp "A" to "B", but I want only Genotype" "A" and "B" also sharing the parameter "no medication".
What I did until now is creating a new subset with filters for every single constellation - and it's quite a mess and it feels like there is a method to analyse from 1 titied up spreadsheet instead of creating 100 new ones torn appart.

I Thank you very much!

Hi,

I think I know what you are trying to do, but it's very difficult to be sure or give specific code without some data and an example of the output.

In order for us to help you with your question, please provide us a minimal reproducible example where you provide a minimal (dummy) dataset and code that can recreate the issue. Once we have that, we can go from there. For help on creating a Reprex, see this guide:

Good luck!
PJ

Hi pieterjanvc,

I've read the FAQ and tried my best. The shortened frame looks like this:

Forumhelp <- tibble::tribble(
~Genotyp, ~Medication, ~ANP.Expr., ~BNP.Expr.,
"db+", "contr", "0,01298", "1,02997",
"db+", "contr", "3,28628", "2,97766",
"db+", "contr", "0,92293", "2,51611",
"db+", "contr", "0,88408", "4,54664",
"db+", "treat", "1,71247", "2,02679",
"db+", "treat", "1,24132", "1,93011",
"db+", "treat", "2,19683", "1,36958",
"db+", "treat", "3,18670", "3,83384",
"dbdb", "contr", "0,18560", "0,23984",
"dbdb", "contr", "0,07026", "0,43382",
"dbdb", "contr", "0,16512", "0,08359",
"dbdb", "contr", "0,31011", "0,19674",
"dbdb", "treat", "0,12522", "0,11293",
"dbdb", "treat", "1,43110", "0,24108",
"dbdb", "treat", "0,03317", "0,28575",
"dbdb", "treat", "0,32317", "0,16888",
)

So what I did now is creating subsets for each constellation:

library(psych, ggplot2)

dbdb <- subset(Forumhelp, Genotyp =="dbdb")
t.test(dbdb$ANP.Expr.~dbdb$Medication)

So the question is wether there is a possibility to fuse those 2 lines of code together without the need to create a subset I would just use to calculate on. I want to test ANP in dependence of Medication but only for the dbdb Genotype for example.

If you need more infos I will provide it. Thank you!

Hi there,

Thanks, the data and code really helped!

Here is what I think you are trying to do:

library(tidyverse)

#The data
Forumhelp <- tribble(
  ~Genotyp, ~Medication, ~ANP.Expr., ~BNP.Expr.,
  "db+", "contr", "0,01298", "1,02997",
  "db+", "contr", "3,28628", "2,97766",
  "db+", "contr", "0,92293", "2,51611",
  "db+", "contr", "0,88408", "4,54664",
  "db+", "treat", "1,71247", "2,02679",
  "db+", "treat", "1,24132", "1,93011",
  "db+", "treat", "2,19683", "1,36958",
  "db+", "treat", "3,18670", "3,83384",
  "dbdb", "contr", "0,18560", "0,23984",
  "dbdb", "contr", "0,07026", "0,43382",
  "dbdb", "contr", "0,16512", "0,08359",
  "dbdb", "contr", "0,31011", "0,19674",
  "dbdb", "treat", "0,12522", "0,11293",
  "dbdb", "treat", "1,43110", "0,24108",
  "dbdb", "treat", "0,03317", "0,28575",
  "dbdb", "treat", "0,32317", "0,16888",
)

#Clean up the data 
 # make it point instead of comma decimal
 # convert from string to number
Forumhelp = Forumhelp %>% mutate(
  ANP.Expr. = as.numeric(str_replace(ANP.Expr., ",", ".")),
  BNP.Expr. = as.numeric(str_replace(BNP.Expr., ",", ".")),
)

#Group by Genotyp and do the t-test
Forumhelp = Forumhelp %>% group_by(Genotyp) %>% 
  summarise(
    as.data.frame(t(unlist(
      t.test(ANP.Expr. ~ Medication)
      )))
)

Forumhelp
#> # A tibble: 2 x 13
#>   Genotyp statistic.t parameter.df p.value conf.int1 conf.int2 `estimate.mean ~
#> * <chr>   <chr>       <chr>        <chr>   <chr>     <chr>     <chr>           
#> 1 db+     -0.9898274~ 4.875534511~ 0.3688~ -2.92173~ 1.306207~ 1.2765675       
#> 2 dbdb    -0.9030789~ 3.139528466~ 0.4303~ -1.31066~ 0.719880~ 0.1827725       
#> # ... with 6 more variables: `estimate.mean in group treat` <chr>,
#> #   `null.value.difference in means` <chr>, stderr <chr>, alternative <chr>,
#> #   method <chr>, data.name <chr>

Created on 2021-02-12 by the reprex package (v1.0.0)

So the core function here is the group_by() from the dplyr package (part of the tidyverse). It will split the data according to the groups you define, and process the rest in those splits. I then used the summarise() function from the same package to use each chunk of data for the t-test. This t-test function generates a list, so I used some tricks to convert the list into a data.frame which will then be pasted together by the summary function with one row per group.

You could also do this with a loop instead or use a mapping function, where you merge the data frames in the end or just keep a list, you would use this function instead then:

Forumhelp = Forumhelp %>% group_by(Genotyp) %>% 
  group_map(
     ~t.test(ANP.Expr. ~ Medication, data = .x)
  )

Please have a look at the power of dplyr if you like to learn more.

Hope this helps,
PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.