Extract Slopes by group, Broom? Dplyr?

I want to know how to get the unique slopes by group from a dataset.

In base R I would use:
sapply(X = unique(Orange$Tree), FUN = function(z){coef(lm(formula = age ~ circumference, subset = Tree == z, data = Orange))[2]})

I tried something with group_by, but I don't think broom recognized it:
Orange %>%
group_by(Tree) %>%
lm(formula = age ~ 1 + circumference, data = .) %>%
tidy() %>%
filter(term == "circumference") %>%
select(estimate)

1 Like

Your code works for me, with the minor exception of the quotations in the filter statement not printing well.

If you're looking to do this for every tree, you need to look into tidyr, purrr and nesting your data into list-columns. What's going on below is:

  1. Grouping your data by Tree
  2. Nesting those groups into a list-column
  3. Create a model column that maps the lm function then tidy to the data list-column.
  4. Unnest the model list column.
  5. Filter it to circumference.
library(tidyverse) #for purrr, tidyr and dplyr
library(broom)

Orange %>% 
  group_by(Tree) %>% 
  nest() %>% 
  mutate(model = map(data, ~lm(age ~ 1 + circumference, data = .x) %>% 
                       tidy)) %>% 
  unnest(model) %>% 
  filter(term == 'circumference')

Which results in:

# A tibble: 5 x 6
   Tree          term  estimate std.error statistic      p.value
  <ord>         <chr>     <dbl>     <dbl>     <dbl>        <dbl>
1     1 circumference 11.919245 0.9188029  12.97258 4.851902e-05
2     2 circumference  7.795225 0.5595479  13.93129 3.425041e-05
3     3 circumference 12.038885 0.8353445  14.41188 2.901046e-05
4     4 circumference  7.169842 0.5719516  12.53575 5.733090e-05
5     5 circumference  8.787132 0.6211365  14.14686 3.177093e-05

You can then extract what you need using pull from dplyr or any other method you prefer.

4 Likes

For some reason your code doesn't work for me.
First I get:
Error in mutate_impl(.data, dots) :
Evaluation error: could not find function "map".

then when I add dplyr:::map I get
Error in mutate_impl(.data, dots) :
Evaluation error: object '.f' of mode 'function' was not found.

Sorry, forgot to add that purrr was also used, as it provides the map function

Depending on the rest of your workflow, it may make more sense to skip the nested data frame:

library(tidyverse) #for purrr, tidyr and dplyr
library(broom)

Orange %>% 
  split(.$Tree) %>% 
  map(~lm(age ~ 1 + circumference, data = .x)) %>% 
  map_df(tidy) %>%
  filter(term == 'circumference')
2 Likes

Excellent solution, Hadley!
Is there a way to have a column that shows the Tree number with the output?

Pass a .id parameter to map_df to convert the names into a column:

library(tidyverse)

Orange %>%
    split(.$Tree) %>%
    map(~lm(age ~ 1 + circumference, data = .x)) %>%
    map_df(broom::tidy, .id = 'tree') %>%
    filter(term == 'circumference')
#>   tree          term  estimate std.error statistic      p.value
#> 1    3 circumference 12.038885 0.8353445  14.41188 2.901046e-05
#> 2    1 circumference 11.919245 0.9188029  12.97258 4.851902e-05
#> 3    5 circumference  8.787132 0.6211365  14.14686 3.177093e-05
#> 4    2 circumference  7.795225 0.5595479  13.93129 3.425041e-05
#> 5    4 circumference  7.169842 0.5719516  12.53575 5.733090e-05

tree is now character instead of an out-of-order ordinal factor, though, so it'll need more cleaning.

3 Likes

Note to All, when I saw the solution with a ~ in map (which I hadn't done before) I went to ?purrr::map I saw pretty much the example of what I needed to do already layed out:
'#' A more realistic example: split a data frame into pieces, fit a'
'#' model to each piece, summarise and extract R^2
mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .x)) %>%
map(summary) %>%
map_dbl("r.squared")

Thank you @alistaire