How to unnest columns with different number of elements?

eoppe1022 · June 10, 2018, 8:01pm

So I'm trying to run multiple models for multiple different variables within a data-set so that I can get the residuals for each model.

Reprex isn't working for this example, so I'm going to do my best to show what's going on.

mydata <- data %>%
  mutate(is_usa = if_else(team == "USA", 1, 0)) %>%
  nest(-league) %>%
  mutate(model_all = map(data, ~lm(betweenness_all ~ is_usa * gp, data = .))) %>%
  mutate(model_5v5 = map(data, ~lm(betweenness_5v5 ~ is_usa * gp, data = .))) %>%
  mutate(model_ev = map(data, ~lm(betweenness_ev ~ is_usa * gp, data = .))) %>%
  mutate(model_pp = map(data, ~lm(betweenness_pp ~ is_usa * gp, data = .))) %>%
  mutate(model_pk = map(data, ~lm(betweenness_pk ~ is_usa * gp, data = .))) %>%
  mutate(model_all_resid = map(model_all, broom::augment)) %>%
  mutate(model_5v5_resid = map(model_5v5, broom::augment)) %>%
  mutate(model_ev_resid = map(model_ev, broom::augment)) %>%
  mutate(model_pp_resid = map(model_pp, broom::augment)) %>%
  mutate(model_pk_resid = map(model_pk, broom::augment))

mydata

which looks like:

Now I can unnest 1 model's residuals perfectly fine

mydata %>%
  unnest(model_all_resid)

But trying to unnest multiple models won't work

mydata %>%
  unnest(model_all_resid, model_5v5_resid)

Any idea what I'm doing wrong?

For whatever it's worth, I'm largely basing this all on @alistaire's stack-exchange answer here, though I don't get the same issue with mtcars data

alistaire · June 11, 2018, 1:13am

Without data it's purely a guess, but one way to generate that error is if there are different numbers of NAs in the data used by each model. lm automatically drops those rows, so they are not present in the residuals/augment results. When the two columns to unnest therefore have different numbers of rows, unnest doesn't know how to line them up and throws an error:

library(tidyverse)

mtcars %>% tbl_df() %>%    # make it a tibble so it prints nicely
    mutate(hp = na_if(hp, 110)) %>%    # add some NAs to one model's predictor
    nest(-cyl) %>% 
    mutate(wt_model = map(data, ~lm(mpg ~ wt, .x)), 
           hp_model = map(data, ~lm(mpg ~ hp, .x)), 
           wt_resid = map(wt_model, residuals), 
           hp_resid = map(hp_model, residuals)) %>% 
    unnest(wt_resid, hp_resid)
#> Error: All nested columns must have the same number of elements.

The simplest fix is to just drop those observations beforehand:

library(tidyverse)

mtcars %>% tbl_df() %>% 
    mutate(hp = na_if(hp, 110)) %>% 
    drop_na(hp) %>%    # drop the rows with missing data
    nest(-cyl) %>% 
    mutate(wt_model = map(data, ~lm(mpg ~ wt, .x)), 
           hp_model = map(data, ~lm(mpg ~ hp, .x)), 
           wt_resid = map(wt_model, residuals), 
           hp_resid = map(hp_model, residuals)) %>% 
    unnest(wt_resid, hp_resid)
#> # A tibble: 29 x 3
#>      cyl wt_resid hp_resid
#>    <dbl>    <dbl>    <dbl>
#>  1     4  -3.67     -2.69 
#>  2     4   2.84     -4.59 
#>  3     4   1.02     -2.47 
#>  4     4   5.25      3.86 
#>  5     4  -0.0513    0.281
#>  6     4   4.69      5.25 
#>  7     4  -4.15     -3.54 
#>  8     4  -1.34     -1.24 
#>  9     4  -1.49      0.280
#> 10     4  -0.627     7.16 
#> # ... with 19 more rows

You could also impute data (if plausible). You could reinsert the NAs before unnesting by joining the augment results back to the original data or reshape to long form, but such decisions will have impacts on your ability to compare models, as they'll be trained on differing observations and thus have differing degrees of freedom.

All of the above is presuming NAs are why the augment results have differing numbers of rows, though. If something else is going on, you'll need to provide a full reproducible example.

eoppe1022 · June 15, 2018, 2:54am

I think you're right. Thanks for the response! I really do appreciate it.