Relative frequencies with dplyr

Hi All,
My post is inspired by this post on SO:
https://stackoverflow.com/questions/24576515/relative-frequencies-proportions-with-dplyr

My question is why do relative frequencies in all these examples not add up to 100% (if we multiply by 100) or value 1 ?

mtcars %>%
  group_by(am, gear) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n))

#   am gear  n      freq
# 1  0    3 15 0.7894737
# 2  0    4  4 0.2105263
# 3  1    4  8 0.6153846
# 4  1    5  5 0.3846154

Here they say it should:
https://www.quora.com/Why-is-the-sum-of-relative-frequency-always-one

Am I misunderstanding something ?

You need either an ungroup() after the summarise or a .groups = "drop" in the summarise.

You should have received a warning alerting you that the output is still grouped.

`summarise()` has grouped output by 'am'. You can override using the `.groups` argument.
1 Like

Great advice, it works now, thank you.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.