# group_by and summarise: improve the code

Suppose we need to calculate the relative abundance of positives and negatives for each individual year.
The procedure I used to use is as follows:

``````library(tidyverse)

# fake data
set.seed(1)
mydf <- tibble(
id = 1:80,
year = sample(2000:2010, 80, replace = T),
result = sample(c("positive", "negative"), 80, replace = T)
)

# code
mydf %>%
group_by(year) %>%
mutate(count_by_year = n()) %>% # total for each year
ungroup() %>%
group_by(year, result) %>%
summarise(count_year_res = n(), # counting of positives and negatives in each year
perc = count_year_res/count_by_year*100) %>%  # relative abundance
unique()
``````

To avoid the following message I can use `reframe` and everything works as expected.

``````Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
``````

Is there a better way to achieve the result without having to use `reframe()` and then `unique()`?
Which method do you usually use (to have a better code)?

Result

``````> mydf %>%
+   group_by(year) %>%
+   mutate(count_by_year = n()) %>%
+   ungroup() %>%
+   group_by(year, result) %>%
+   reframe(count_year_res = n(),
+             perc = count_year_res/count_by_year*100) %>%
+   unique()
# A tibble: 21 × 4
year result   count_year_res  perc
<int> <chr>             <int> <dbl>
1  2000 negative              6  85.7
2  2000 positive              1  14.3
3  2001 negative              3  60
4  2001 positive              2  40
5  2002 negative              2  40
6  2002 positive              3  60
7  2003 negative              3  50
8  2003 positive              3  50
9  2004 negative              4  57.1
10  2004 positive              3  42.9
# … with 11 more rows
# ℹ Use `print(n = ...)` to see more rows
``````

I think if you slide unique() inside the summarise , you can keep the summarise and get the same result; and at least on this example data, I dont see the same warnings.

``````mydf %>%
group_by(year) %>%
mutate(count_by_year = n()) %>% # total for each year
group_by(year, result) %>%
summarise(count_year_res = n(), # counting of positives and negatives in each year
perc = unique(count_year_res/count_by_year*100))``````

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.