dplyr 1.0.0 summarize/summarise inform warnings

How are folks finding the change in behavior in dplyr 1.0's summarize function to emit a warning when using the default behavior of dropping one layer of grouping when called?

All my legacy code now emits lots of extra messages, cluttering up the output and generally offending my sensibilities. :wink: I know there's the dplyr.summarise.inform option (and have that set), though I have to set that in all the different environments in which I'm working (docker environments, testing environments, etc.). Yes, there's renv and proper dependency management as well. :stuck_out_tongue:

I can't decide if I'm being a traditionalist and need to get over myself here or if this is worth raising an issue on the dplyr repo to talk about this change in depth. Talk me off my ledge? :slight_smile:

3 Likes

I tend to agree with your sentiment. Right now I'm ignoring the warnings, but they do clog up output logs for automate runs (thanks for highlighting dplyr.summarise.inform though).

My take is that the dropping of the last layer was always undesirable behaviour, which this tries to address through the warnings without changing the default. I have always used ungroup() to remove the grouping explicitly, so the warnings are unnecessary, but I now have to choose whether to continue with the ungrouping or adopt the new summarise() capability with diverging coding styles over time. Not a big problem in the great scheme of things.

2 Likes

Dropping the last grouping level made perfect sense before because summarise() used to only return a single row per group. But with dplyr 1.0.0, summarise() can return multiple rows and it's this change that makes the default behaviour sometimes undesirable.

Personally, I always explicitly ungroup() after completing my grouped operations, so it's easy for me to go back and add an appropriate .groups argument to my summarise() calls in order to suppress the message. But I found this major change a good opportunity to revisit my old scripts and see if there are better ways of doing tasks (such as with the new relocate() verb). Maybe I just have too much free time. :sweat_smile:

2 Likes

I understand the reasoning … it's just that I disagree that it was ever desirable and it was behaviour which caught out many beginners. I also seem to recall that it did not always work well using dbplyr.

Anyway, like you I have to decide whether it's worth refreshing old code, which would not look great to somebody looking at it with the benefit of the many changes introduced over the last couple of years. I'll probably leave it unless something breaks.

Was a bit surprised with these messages initially. Have got used to it. Good to know that dplyr.summarize.inform = false helps getting did of the messages.
Also I think .groups = "drop_last" would also achieve the same thing

Thanks
~Arnab

The message did one important thing for me, so I will thank the developers. I had not been aware of the default dropping behavior. Of course I should have been -- "No excuse Sir". So now I know, and reexamining my old code was a good idea. Turns out the the default was always OK in my use, and .groups = "drop" would also work. My solution was to simply reexamine my old code where it is still alive -- about 50 summarize calls, mostly used in personal functions that are often called. I systematically changed the code.

I do not like anything that seems like a global turn off of warnings. I want to see those warnings, and go back to the source to address and remove them.