I want to know how to get the unique slopes by group from a dataset.
In base R I would use:
sapply(X = unique(Orange$Tree), FUN = function(z){coef(lm(formula = age ~ circumference, subset = Tree == z, data = Orange))[2]})
I tried something with group_by, but I don't think broom recognized it:
Orange %>%
group_by(Tree) %>%
lm(formula = age ~ 1 + circumference, data = .) %>%
tidy() %>%
filter(term == "circumference") %>%
select(estimate)
1 Like
Your code works for me, with the minor exception of the quotations in the filter
statement not printing well.
If you're looking to do this for every tree, you need to look into tidyr
, purrr
and nesting your data into list-columns. What's going on below is:
Grouping your data by Tree
Nesting those groups into a list-column
Create a model column that maps the lm
function then tidy
to the data
list-column.
Unnest the model list column.
Filter it to circumference.
library(tidyverse) #for purrr, tidyr and dplyr
library(broom)
Orange %>%
group_by(Tree) %>%
nest() %>%
mutate(model = map(data, ~lm(age ~ 1 + circumference, data = .x) %>%
tidy)) %>%
unnest(model) %>%
filter(term == 'circumference')
Which results in:
# A tibble: 5 x 6
Tree term estimate std.error statistic p.value
<ord> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 circumference 11.919245 0.9188029 12.97258 4.851902e-05
2 2 circumference 7.795225 0.5595479 13.93129 3.425041e-05
3 3 circumference 12.038885 0.8353445 14.41188 2.901046e-05
4 4 circumference 7.169842 0.5719516 12.53575 5.733090e-05
5 5 circumference 8.787132 0.6211365 14.14686 3.177093e-05
You can then extract what you need using pull
from dplyr
or any other method you prefer.
4 Likes
For some reason your code doesn't work for me.
First I get:
Error in mutate_impl(.data, dots) :
Evaluation error: could not find function "map".
then when I add dplyr:::map I get
Error in mutate_impl(.data, dots) :
Evaluation error: object '.f' of mode 'function' was not found.
Sorry, forgot to add that purrr
was also used, as it provides the map
function
hadley
November 13, 2017, 7:47pm
6
Depending on the rest of your workflow, it may make more sense to skip the nested data frame:
library(tidyverse) #for purrr, tidyr and dplyr
library(broom)
Orange %>%
split(.$Tree) %>%
map(~lm(age ~ 1 + circumference, data = .x)) %>%
map_df(tidy) %>%
filter(term == 'circumference')
2 Likes
Excellent solution, Hadley!
Is there a way to have a column that shows the Tree number with the output?
Pass a .id
parameter to map_df
to convert the names into a column:
library(tidyverse)
Orange %>%
split(.$Tree) %>%
map(~lm(age ~ 1 + circumference, data = .x)) %>%
map_df(broom::tidy, .id = 'tree') %>%
filter(term == 'circumference')
#> tree term estimate std.error statistic p.value
#> 1 3 circumference 12.038885 0.8353445 14.41188 2.901046e-05
#> 2 1 circumference 11.919245 0.9188029 12.97258 4.851902e-05
#> 3 5 circumference 8.787132 0.6211365 14.14686 3.177093e-05
#> 4 2 circumference 7.795225 0.5595479 13.93129 3.425041e-05
#> 5 4 circumference 7.169842 0.5719516 12.53575 5.733090e-05
tree
is now character instead of an out-of-order ordinal factor, though, so it'll need more cleaning.
3 Likes
Note to All, when I saw the solution with a ~ in map (which I hadn't done before) I went to ?purrr::map I saw pretty much the example of what I needed to do already layed out:
'#' A more realistic example: split a data frame into pieces, fit a'
'#' model to each piece, summarise and extract R^2
mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .x)) %>%
map(summary) %>%
map_dbl("r.squared")