6 months later, the .groups argument in summarise and the familiar warning message

I've gotten used to sometimes putting the .groups = "drop" argument in summarize and sometimes putting options(dplyr.summarise.inform = FALSE) at the start of my files, but I must say that ~6 months since the change, I still find this message irritating.

I almost never want to maintain the grouping past the summarize() call, and have a long-ingrained habit of immediately put an ungroup() call immediately afterwards. This is probably how most of us were taught. In these cases, the message is both useless and a source of clutter.

Maybe this is extreme, but I think I might actually prefer if summarize() always dropped the grouping. In the rare cases where I need to continue performed operations on the grouped data frame, I could always just group again. For the other 99% of cases, it would save me code. I wouldn't even need the ungroup() or the .groups = .

The old topics on this are closed, so with the benefit of 6 months experience, can we have the dialog again?

library(tidyverse)
mtcars %>% group_by(cyl, gear) %>% summarize(mean(hp)) %>% ungroup()
#> `summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
#> # A tibble: 8 x 3
#>     cyl  gear `mean(hp)`
#>   <dbl> <dbl>      <dbl>
#> 1     4     3        97 
#> 2     4     4        76 
#> 3     4     5       102 
#> 4     6     3       108.
#> 5     6     4       116.
#> 6     6     5       175 
#> 7     8     3       194.
#> 8     8     5       300.

Created on 2021-06-03 by the reprex package (v1.0.0)

5 Likes

Have you considered dropping this into your .Rprofile ?

We have recently limited the cases when the message appears, but not this particular case. Going from 2 grouping variables to only one has always been a source of friction and so we opted for messaging about it, with the hope that it will clarify.

Maybe if we were designing from scratch, we could always ungroup() but that is not currently possible as it probably would break a lot of code.

However, if you always want summarise() to ungroup()and be silent about it, perhaps you can use a wrapper function:

library(tidyverse)

summarise2 <- function(.data, ..., .groups = "drop") {
  dplyr::summarise(.data, ..., .groups = .groups)
}

mtcars %>% 
  group_by(cyl, gear) %>% 
  summarise2(mean(hp))
#> # A tibble: 8 x 3
#>     cyl  gear `mean(hp)`
#>   <dbl> <dbl>      <dbl>
#> 1     4     3        97 
#> 2     4     4        76 
#> 3     4     5       102 
#> 4     6     3       108.
#> 5     6     4       116.
#> 6     6     5       175 
#> 7     8     3       194.
#> 8     8     5       300.

Created on 2021-06-03 by the reprex package (v2.0.0)

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.