Linear regression over one variable and one list

Hello,
I've found some examples performing loops with linear regression over one variable.
They use group_map and broom. That's fine.
But I can't find info about using map2 or group_map2 (if it exists) in order to move 2 variables /list at the moment of regress...

library(gapminder)
data<-gapminder
data
explic<-list("year","gdpPercap")
data %>% 
  group_by(country) %>% 
  group_modify(~broom::glance(lm(lifeExp~explic,data=.x)))

The code fails. If I replace explic in the broom line by a variable, as year, It works computing each regression for country.

It seems strange the example, but I read examples looping over one variable with regress and map.
As always, thanks for your time and interest.
Have a nice day.

Nested data frames work better in a long format, it is not completely clear to me what your objective is but is this close?

library(gapminder)
library(tidyverse)

explic<-list("year","gdpPercap")

gapminder %>%
    pivot_longer(cols = c(-country, -lifeExp, -continent),
                 names_to = "independent_var",
                 values_to = "value") %>% 
    filter(independent_var %in% explic) %>% 
    group_nest(country, independent_var) %>% 
    mutate(model = map(data, ~broom::glance(lm(lifeExp~value, data=.x)))) %>% 
    select(-data) %>% 
    unnest(model)
#> # A tibble: 284 × 14
#>    country     independent_var r.squared adj.r.squared sigma statistic  p.value
#>    <fct>       <chr>               <dbl>         <dbl> <dbl>     <dbl>    <dbl>
#>  1 Afghanistan gdpPercap         0.00226     -0.0975   5.34     0.0227 8.83e- 1
#>  2 Afghanistan year              0.948        0.942    1.22   181.     9.84e- 8
#>  3 Albania     gdpPercap         0.701        0.671    3.63    23.4    6.82e- 4
#>  4 Albania     year              0.911        0.902    1.98   102.     1.46e- 6
#>  5 Algeria     gdpPercap         0.818        0.800    4.63    45.0    5.33e- 5
#>  6 Algeria     year              0.985        0.984    1.32   662.     1.81e-10
#>  7 Angola      gdpPercap         0.0906      -0.000286 4.01     0.997  3.42e- 1
#>  8 Angola      year              0.888        0.877    1.41    79.1    4.59e- 6
#>  9 Argentina   gdpPercap         0.692        0.661    2.44    22.4    7.97e- 4
#> 10 Argentina   year              0.996        0.995    0.292 2246.     4.22e-13
#> # … with 274 more rows, and 7 more variables: df <dbl>, logLik <dbl>,
#> #   AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>, nobs <int>

Created on 2022-07-28 by the reprex package (v2.0.1)

1 Like

Thanks for your code.
But I have a doubt with It.
In case you are working with a survey, using pivot longer can modify some statistics.
Statistics as the variance and related.
That's why I need to perform something as map2 over a list of variables.
Across also would help me in doing the task but preserving the data structure.
I collapse trying It...

R functions do not perform in-place modifications so the original data frame structure is safe, and regarding the linear model fitting, it has no effect either since this is applied over nested data anyways, so it would be equivalent to manually selecting the independent variable one by one.

across() is not meant to be used within the formula argument. I don't see a way to perform this in a wide format, and also, how would you integrate the glance() output within the original data frame structure? There is no sensible way that I can imagine.

I'll check with a survey using complex design the code you kindly wrote.
I hope you are right...

I can't vouch for the validity of your approach from a statistical standpoint, but, to test my code you only need to manually filter by any country and perform a linear regression, the results are exactly the same.

library(gapminder)
library(tidyverse)

explic<-list("year","gdpPercap")

gapminder %>%
    pivot_longer(cols = c(-country, -lifeExp, -continent),
                 names_to = "independent_var",
                 values_to = "value") %>% 
    filter(independent_var %in% explic) %>% 
    group_nest(country, independent_var) %>% 
    mutate(model = map(data, ~broom::glance(lm(lifeExp~value, data=.x)))) %>% 
    select(-data) %>% 
    unnest(model)
#> # A tibble: 284 × 14
#>    country     independent_var r.squared adj.r.squared sigma statistic  p.value
#>    <fct>       <chr>               <dbl>         <dbl> <dbl>     <dbl>    <dbl>
#>  1 Afghanistan gdpPercap         0.00226     -0.0975   5.34     0.0227 8.83e- 1
#>  2 Afghanistan year              0.948        0.942    1.22   181.     9.84e- 8
#>  3 Albania     gdpPercap         0.701        0.671    3.63    23.4    6.82e- 4
#>  4 Albania     year              0.911        0.902    1.98   102.     1.46e- 6
#>  5 Algeria     gdpPercap         0.818        0.800    4.63    45.0    5.33e- 5
#>  6 Algeria     year              0.985        0.984    1.32   662.     1.81e-10
#>  7 Angola      gdpPercap         0.0906      -0.000286 4.01     0.997  3.42e- 1
#>  8 Angola      year              0.888        0.877    1.41    79.1    4.59e- 6
#>  9 Argentina   gdpPercap         0.692        0.661    2.44    22.4    7.97e- 4
#> 10 Argentina   year              0.996        0.995    0.292 2246.     4.22e-13
#> # … with 274 more rows, and 7 more variables: df <dbl>, logLik <dbl>,
#> #   AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>, nobs <int>

gapminder %>% 
    filter(country == "Afghanistan") %>% 
    lm(formula = "lifeExp~gdpPercap", data = .) %>% 
    broom::glance()
#> # A tibble: 1 × 12
#>   r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
#>       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
#> 1   0.00226       -0.0975  5.34    0.0227   0.883     1  -36.0  78.1  79.5
#> # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Created on 2022-07-28 by the reprex package (v2.0.1)

I forgot something important.
The packages srvyr or survey don't work with pivot wider...
I think that's related to the variance treatment and alike.
I know your code works flawless under any other circumstance, andresrcs.
But dealing with surveys is a bit different...

Maybe you should put together a reprex that better represents your specific use case.

library(srvyr)
library(survey)
library(broom)


dclus1 <- apiclus1 %>%
  as_survey_design(dnum, weights = pw, fpc = fpc)

dclus1 %>% 
  srvyr::group_by(stype) %>% 
  group_map_dfr(~glance(svyglm(meals ~ pcttest, .)))


ind_var-list("sch.wide","awards","pcttest")
ind_var


The idea is to apply the group_map using the list (ind_var), and not having to write each variable inside the ind_var as I do below...

dclus1 %>% 
  srvyr::group_by(stype) %>% 
  group_map_dfr(~glance(svyglm(meals ~ sch.wide, .)))

dclus1 %>% 
  srvyr::group_by(stype) %>% 
  group_map_dfr(~glance(svyglm(meals ~ awards, .)))

dclus1 %>% 
  srvyr::group_by(stype) %>% 
  group_map_dfr(~glance(svyglm(meals ~ pcttest, .)))


I can't go any further than maping over one variable with group_map_dfr...the list with variables is where I can't progress.

library(srvyr)
library(survey)
library(broom)
library(purrr)
library(glue)

data("api")

dclus1 <- apiclus1 %>%
  as_survey_design(dnum, weights = pw, fpc = fpc)

# dclus1 %>%
#   srvyr::group_by(stype) %>%
#   group_map_dfr(~glance(svyglm(meals ~ pcttest, .)))


(ind_var <- list("sch.wide", "awards", "pcttest"))


map(
  ind_var,
  ~ {
    ind <- .x
    myf <- glue("meals ~ {ind}")
    dclus1 %>%
      srvyr::group_by(stype) %>%
      group_map_dfr(~ glance(svyglm(as.formula(myf), .)) %>% 
                      mutate(label = ind))
  }
)
1 Like