`summarise()` regrouping output by . . . Has this happened to anyone else in summarise()?

I updated all my packages and summarise() has changed behaviour.

As I run my code I now get lots of messages that say:

summarise() regrouping output by 'oppId' (override with .groups argument)

The effect is to group things by something I don't want it to group them by, with the final result that a 20-row data frame is ending up at 63,468 rows.

Is there any way to turn off this behaviour?

the groups or lack of groups for summarise to process will always be determined by the presence or absence of prior group_by statements on the dataframe passed to summarise.
If you are getting the 'wrong' groupings then you should investigate your use or lack of use of group_by().
That said, the messaging and the final grouping behaviour of summarise (by which is meant the group_by settings on the summarised dataframe can be easiliy controled with .groups , as the information message advised you. here is an example you can look at.
iris dataset has 150 records, 50 for each species

library(tidyverse)
iris %>% summarise(n=n())
iris %>% group_by(Species) %>% summarise(n=n())
iris %>% group_by(Species) %>% summarise(n=n(), .groups = "keep")
iris %>% group_by(Species) %>% summarise(n=n(), .groups = "drop")
iris %>% group_by(Species) %>% summarise(n=n(), .groups = "drop_last")
iris %>% group_by(Species) %>% summarise(n=n(), .groups = "rowwise")
4 Likes

Have you added an ungroup() before your summarise()?

No. That's a new one on me - where does ungroup() go?

Thing is, all this code worked 3 months ago. I'll look this up

In the case that you have described, I would place ungroup() in the line just above summarise().
No need to call any arguments in the ungroup(), if you want all ungrouped and assuming you are piping your dataframe. .

I ran into this a while ago, but didn't encounter any breakage from it; just the warnings. The previous behavior was to always do what's now .groups = "drop_last", which is also now (as of dplyr 1.0) the default if all results have one row, but if the number of rows varies it uses "keep". Try explicitly setting it and see which gets the output you expected. Without a reprex it's hard to know what's actually happening.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.