Running multiple regression models with the help of matrix in R

I want to run multiple regression models in parallel. The approach that I am trying to incorporate is as follows:-
Let's say we have a dataset with 1 dependent variable (DV) as y and 4 independent variables (IVs) as - x1, x2, x3 and x4. I want to run all possible regression models -
y with x1
y with x2
y with x3
y with x4
y with x1, x2
...
y with x1, x2, x3, x4

So likewise we will have (2^4) -1 = 15 models. I want to have a matrix wherein I can have the representation of indicator variables for all the models and then for each row we can run regression using regular lm function.

x1 x2 x3 x4
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 1 0 0
1 0 1 0
1 0 0 1
0 1 1 0
0 1 0 1
0 0 1 1
1 1 1 0
1 1 0 1
1 0 1 1
0 1 1 1
1 1 1 1

Is this possible?
If not, is there any other way to do this?
Any kind of guidance will really be helpful.

Thanks! :slight_smile:

You could create such a matrix with expand.grid(0:1,0:1,0:1,0:1) or, to make it easier with larger numbers of variables, expand.grid(replicate(4, 0:1, simplify=FALSE)), but what about creating the regression formulas instead:

library(tidyverse)

formulas = map(1:4, ~ combn(paste0("x",1:4), .x) %>% 
      apply(., 2, function(v) paste0("y ~ ", paste(v, collapse=" + ")))) %>% 
  unlist

formulas
 [1] "y ~ x1"                "y ~ x2"                "y ~ x3"               
 [4] "y ~ x4"                "y ~ x1 + x2"           "y ~ x1 + x3"          
 [7] "y ~ x1 + x4"           "y ~ x2 + x3"           "y ~ x2 + x4"          
[10] "y ~ x3 + x4"           "y ~ x1 + x2 + x3"      "y ~ x1 + x2 + x4"     
[13] "y ~ x1 + x3 + x4"      "y ~ x2 + x3 + x4"      "y ~ x1 + x2 + x3 + x4"

Then to run the regressions:

models = map(formulas, ~lm(.x, data=df))

There are also some packages with features for doing subsets regression. For example, here.

3 Likes

The answer Joel posted above is awesome!

But since you specifically asked for a way to create such a matrix, I'd like to add a concise alternative expand.grid(rep(list(0:1), 4)) of what he suggested as expand.grid(replicate(4, 0:1, simplify=FALSE)).

expand.grid(rep(list(0:1), 4))
#>    Var1 Var2 Var3 Var4
#> 1     0    0    0    0
#> 2     1    0    0    0
#> 3     0    1    0    0
#> 4     1    1    0    0
#> 5     0    0    1    0
#> 6     1    0    1    0
#> 7     0    1    1    0
#> 8     1    1    1    0
#> 9     0    0    0    1
#> 10    1    0    0    1
#> 11    0    1    0    1
#> 12    1    1    0    1
#> 13    0    0    1    1
#> 14    1    0    1    1
#> 15    0    1    1    1
#> 16    1    1    1    1
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.