dplyr and group_by and group_split problem

jfca283 · August 1, 2020, 1:03am

Hello,
I'm trying to replicate a pivot table.So, I am using dplyr and group_by.
The code below , I think, collapses the sum of "hp" aggregating by "cyl" and "mpg". Also, It provides the percentage of each row.

kk=datasets::mtcars
names(kk)
kk
kk %>% group_by(cyl,mpg) %>% 
  summarise(st_=sum(hp)) %>% 
  mutate(s_st=st_/sum(st_)*100)

However, If I want to recreate the same table, but separating/splitting the analysis using "am" , I receive this:

kk %>% group_by(cyl,mpg) %>% 
  summarise(st_=sum(hp)) %>% 
  mutate(s_st=st_/sum(st_)*100) %>%   group_split(am)


Warning message:
... is ignored in group_split(<grouped_df>), please use group_by(..., .add = TRUE) %>% group_split()

I don't know what am I doing wrong. Can you guide me?
Thanks for your help.

ries9112 · August 1, 2020, 1:18am

I am unclear on what you are trying to accomplish, but if you were trying to recreate the same analysis but for the am field, I would think you would change the previous grouping variables by the field am like so:

kk=datasets::mtcars
names(kk)
kk
kk %>% group_by(am) %>% 
    summarise(st_=sum(hp)) %>% 
    mutate(s_st=st_/sum(st_)*100)

Which produces the following output:

If you wanted to add the am field to the rest of the grouped fields from your first output, you could add it as one of the variables being grouped by:

kk=datasets::mtcars
names(kk)
kk
kk %>% group_by(cyl, mpg, am) %>% 
    summarise(st_=sum(hp)) %>% 
    mutate(s_st=st_/sum(st_)*100)

If this is not what you are looking to do, please clarify your question a little bit and I would be more than happy to take another look.

jfca283 · August 3, 2020, 4:25am

I'm sorry I wasn't clear.
The issue is that I need to compute a percentage with mutate, but splitting the results.
I mean, How do I group by list of variables (cyl,mpg) counting a third one (hp), and also, compute the relative frequency of that? Mutate is doing that, but then again, what if I need to split the results by a fourth variable (am)?
The code I provided works flawless if I omit the split sentence. Mutate performs what I intended to.
But adding the group_split, the code doesn't perform the task of ```
s_st=st_/sum(st_)*100

separating for each category/value from "am"

nirgrahamuk · August 3, 2020, 9:48am

Hi jfca,
The way I'm interpreting this is that ries9112 provided you with the solution in the second half of this post
an additional splitting level is added, by addition to the grouping level
kk %>% group_by(cyl, mpg,) %>% etc.
becomes
kk %>% group_by(cyl, mpg, am) %>% etc.
adding a fourth variable would be
kk %>% group_by(cyl, mpg, am,gears) %>% etc.

Your post imples that you might prefer to 'bolt on' the am aspect 'late' in the flow,
kk %>% group_by(cyl, mpg,) %>% etc. %>% add_to_early_group(am)

however there is no out of the box facility to do this, (I could be wrong but I expect it would require a significant effort of metaprogramming, perhaps requiring a special pipe operate to catch preceeding elements in the piped flow as calls that can be analysed so that in principle an early group_by command can be appended to and rerun).

jfca283 · August 7, 2020, 12:44am

Thanks. I think It worked the way I needed it.

system · August 28, 2020, 12:44am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.