creating a tidy formula

Consider this simple tibble

data <- tibble(y = c(1,2,3,4,5),
               var1 = c(20,19,20,30,10),
               var2 = c(21,13,21,31,10),
               boo1 = c(40,40,40,40,2),
               boo2 = c(1,2,34,40,2))
# A tibble: 5 x 5
      y  var1  var2  boo1  boo2
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    20    21    40     1
2     2    19    13    40     2
3     3    20    21    40    34
4     4    30    31    40    40
5     5    10    10     2     2

I would like to be able to define a lm formula based on some regex condition on my tibble colum names.

Something like


data %>% 
  names() %>%  
  str_subset(.,regex('var')) %>% 
  paste('y ~', ., collapse = '+') %>% 
  as.formula() %>% 
  lm(., data = data)

which fails at several steps. For instance, the paste step gives "y ~ var1+y ~ var2"and the lm step simply does not work.

What should I try instead?
Thanks!

paste() and lm() are not vectorized, you could iterate using purrr::map()

library(tidyverse)
data <- tibble(y = c(1,2,3,4,5),
               var1 = c(20,19,20,30,10),
               var2 = c(21,13,21,31,10),
               boo1 = c(40,40,40,40,2),
               boo2 = c(1,2,34,40,2))
data %>% 
    names() %>%  
    str_subset(.,regex('var')) %>%
    map(~paste('y ~', ., collapse = '+')) %>%
    map(~lm(., data = data))
#> [[1]]
#> 
#> Call:
#> lm(formula = ., data = data)
#> 
#> Coefficients:
#> (Intercept)         var1  
#>     3.88745     -0.04482  
#> 
#> 
#> [[2]]
#> 
#> Call:
#> lm(formula = ., data = data)
#> 
#> Coefficients:
#> (Intercept)         var2  
#>     3.28571     -0.01488
1 Like

thanks andres but what I mean is actually being able to run the regression y~var1 + var2

You need to use paste in two steps. I used grep, but you can use str_subset if you prefer. There'll be no change.

library(magrittr)

dataset <- tibble::tibble(y = c(1, 2, 3, 4, 5),
                          var1 = c(20, 19, 20, 30, 10),
                          var2 = c(21, 13, 21, 31, 10),
                          boo1 = c(40, 40, 40, 40, 2),
                          boo2 = c(1, 2, 34, 40, 2))

dataset %>%
  names() %>%
  grep(pattern = "var",
       value = TRUE) %>%
  paste(collapse = " + ") %>%
  paste("y", .,
        sep = " ~ ") %>%
  lm(data = dataset)
#> 
#> Call:
#> lm(formula = ., data = dataset)
#> 
#> Coefficients:
#> (Intercept)         var1         var2  
#>      4.2221      -0.2149       0.1580

Created on 2019-06-10 by the reprex package (v0.3.0)

Hope this helps.

5 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.