2 groups - multiple variables

Thanks for posting a reprex for discussion! As I said, the details depend on how the data is organized, so it’s useful to all be talking about the same example. In the case you proposed, not only is the predictor variable not the first one, you also only want to use a subset of the variables as responses. In that case, you might do something like this:

library(tidyverse)

cars_dat <- mtcars

resp_vars <- c("mpg", "disp", "hp", "drat", "wt", "qsec")

t_tests_base <- lapply(
  cars_dat[resp_vars],
  function(x) { t.test(x ~ am, data = cars_dat) }
)

t_tests_base$disp
#> 
#>  Welch Two Sample t-test
#> 
#> data:  x by am
#> t = 4.1977, df = 29.258, p-value = 0.00023
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>   75.32779 218.36857
#> sample estimates:
#> mean in group 0 mean in group 1 
#>        290.3789        143.5308

t_tests_tidy <- map(
  select(cars_dat, resp_vars),
  ~ t.test(.x ~ am, data = cars_dat)
)

t_tests_tidy$disp
#> 
#>  Welch Two Sample t-test
#> 
#> data:  .x by am
#> t = 4.1977, df = 29.258, p-value = 0.00023
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>   75.32779 218.36857
#> sample estimates:
#> mean in group 0 mean in group 1 
#>        290.3789        143.5308

Created on 2018-08-26 by the reprex package (v0.2.0).

While you can treat R as a command-line controlled system for executing statistical procedures, where your goal is just to look up and run the “right” series of commands for a given task, you will get further faster if you start thinking about it as a language for statistical computing. Meaning that you familiarize yourself with the syntax that lets you express how to do things in R (for instance “take this data frame and select only these n columns from it”) so that you eventually gain the power to solve whatever challenges your data throws at you.

If you just want to translate your existing way of doing things into R terms (not a bad starting point when there is work to be done!), this book might help:


There’s also a DataCamp course from the same author:

If you want to start learning how to use R more fluently on its own terms, this thread has lots of good resources: What's your favorite intro to R?

As for ANOVA — is it as simple? If we’re talking how to run a bunch of ANOVAs on a data frame of variables, yes the same code pattern applies. If we’re talking about how to code the model itself, there are several ways to do it. ANOVA is an interesting case because there is an important difference between the sums of squares calculation preferred by pure statisticians and the one that has become conventional in biostats circles (driven partly by what SAS and SPSS decided to bake in). Base R, having been written by members of one community, defaults to a method that doesn’t please the other community. For discussion and approaches to coding One-Way ANOVA (you didn’t say what kind of model you were looking for), see:

5 Likes