Regroup of data by considering the number of rows

I want to regroup my dataframe if the existing group has >4 rows, it is further subdivided till it has
a maximum number of 4 rows.

set.seed(100)
dat=data.frame("Group"=rep(seq(1:5),sample(1:5)))
dat=dat %>% 
  mutate(Group_length = rep(rle(Group )$lengths, rle(Group )$lengths))
dat

Here, in the above example, Group3 has 5 rows. So, it would be divided into two groups consisting
of first 4 rows as a new group and remaining 1 row as another group

I want to have an output like this (Please ignore Temp value in the expected doutput)

set.seed(100)
dat1=data.frame("Group"=c(1,1,2,2,2,3,3,3,3,4,5,5,5,5,6),
               "Temp"=sample(1:15))
dat1

Thanks

Easy enough using the dplyr::row_number() function. This gives us the row number of each record, which when used with group_by resets the count for each group. This lets us identify when we have too many rows. I don't know if you need to adjust the group id of all the subsequent groups as you do in your example, so I made things a little simpler by just adding a sub group suffix for the new group notation.
As a side note dplyr::n() will tell you how many total rows are in each group.

library(dplyr)

set.seed(100)
dat=data.frame("Group"=rep(seq(1:5),sample(1:5)))
dat=dat %>% 
  group_by(Group)  %>%
  mutate(Group_length = n()) %>% 
  ungroup()

MAX_GROUP = 4
dat %>% 
  group_by(Group) %>% 
  mutate(new_group = 
           paste(Group, floor(row_number()/ (MAX_GROUP + 1)), sep = "-")) %>% 
  ungroup()

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.