# creating a tidy formula

Consider this simple `tibble`

``````data <- tibble(y = c(1,2,3,4,5),
var1 = c(20,19,20,30,10),
var2 = c(21,13,21,31,10),
boo1 = c(40,40,40,40,2),
boo2 = c(1,2,34,40,2))
# A tibble: 5 x 5
y  var1  var2  boo1  boo2
<dbl> <dbl> <dbl> <dbl> <dbl>
1     1    20    21    40     1
2     2    19    13    40     2
3     3    20    21    40    34
4     4    30    31    40    40
5     5    10    10     2     2
``````

I would like to be able to define a `lm` formula based on some regex condition on my `tibble` colum names.

Something like

``````
data %>%
names() %>%
str_subset(.,regex('var')) %>%
paste('y ~', ., collapse = '+') %>%
as.formula() %>%
lm(., data = data)
``````

which fails at several steps. For instance, the `paste` step gives `"y ~ var1+y ~ var2"`and the `lm` step simply does not work.

Thanks!

`paste()` and `lm()` are not vectorized, you could iterate using `purrr::map()`

``````library(tidyverse)
data <- tibble(y = c(1,2,3,4,5),
var1 = c(20,19,20,30,10),
var2 = c(21,13,21,31,10),
boo1 = c(40,40,40,40,2),
boo2 = c(1,2,34,40,2))
data %>%
names() %>%
str_subset(.,regex('var')) %>%
map(~paste('y ~', ., collapse = '+')) %>%
map(~lm(., data = data))
#> []
#>
#> Call:
#> lm(formula = ., data = data)
#>
#> Coefficients:
#> (Intercept)         var1
#>     3.88745     -0.04482
#>
#>
#> []
#>
#> Call:
#> lm(formula = ., data = data)
#>
#> Coefficients:
#> (Intercept)         var2
#>     3.28571     -0.01488
``````
1 Like

thanks andres but what I mean is actually being able to run the regression `y~var1 + var2`

You need to use `paste` in two steps. I used `grep`, but you can use `str_subset` if you prefer. There'll be no change.

``````library(magrittr)

dataset <- tibble::tibble(y = c(1, 2, 3, 4, 5),
var1 = c(20, 19, 20, 30, 10),
var2 = c(21, 13, 21, 31, 10),
boo1 = c(40, 40, 40, 40, 2),
boo2 = c(1, 2, 34, 40, 2))

dataset %>%
names() %>%
grep(pattern = "var",
value = TRUE) %>%
paste(collapse = " + ") %>%
paste("y", .,
sep = " ~ ") %>%
lm(data = dataset)
#>
#> Call:
#> lm(formula = ., data = dataset)
#>
#> Coefficients:
#> (Intercept)         var1         var2
#>      4.2221      -0.2149       0.1580
``````

Created on 2019-06-10 by the reprex package (v0.3.0)

Hope this helps.

5 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.