How to keep the removed rows in separate group using group_by() & filter() in dplyr

In dplyr package, I am using group_by() function and then applying the filter() function to remove some rows from that group. Now, I want to put the removed rows (which are left out from the group) into a whole new group.

This is my code-

threshold <- dummy %>%
group_by(expiry_date, location_code,model,age,Emp_id) %>%
filter(Date <= as.Date(min(Date) + 2), .preserve = TRUE) %>
arrange(expiry_date, location_code,model,age,Emp_id)

It is giving me the filtered out rows but I want to keep the removed rows as well in a different group. Please provide me a work around for this. Thank You!

If there are rows that you want in your dataset, then you shouldn't filter them out (which, in effect, gets rid of them). Depending on what your goal is, there are a number of different ways to approach this, I can't totally tell from your example, but if you're trying to do something to one group and not another, you can use if_else()- or case_when()-type logic.

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

Thanks for the quick response. I'll again try to explain the problem.

Let's say I have 4 features:

  1. Creation_date
  2. Age
  3. location_code
  4. device_type

And total rows = 1000
Now, I want to create groups/clusters which have the same values for 3 columns namely - Age, location_code and device_type.

All the groups formed will have all the 4 features.
Now in each group, the values of "Age", "location_code" and "device_type" will be same for each row.

And value of "Creation_date" may or may not be same for each row.

Now after creating groups, let's analyze the first group. In this group,
I want to keep a row only if for that row the Creation_date is < = min(Creation_date + 2 days)
rows which do not satisfy this condition, I would like to put them into different cluster (or group).

I want to repeat this process for all the groups formed.

I think this explanation might help in understanding the problem.

Is there a typo ? as it seems guaranteed that any positive number would be smaller than itself plus an positive integer, so not a useful criteria to to split on...

That said heres a basic example of doing the 'kind of thing'.

library(tidyverse)

set.seed(42)
(exdf<- tibble(
  creation_date = sample(seq.Date(from=as.Date("2019/01/01"),by="day",length.out = 400),
                         size = 1000,
                         replace=TRUE

),
age = sample.int(3,size=1000,replace = TRUE)*20+15,
location = sample(letters[1:5],
                  size = 1000,
                  replace=TRUE),
device= sample(LETTERS,
              size = 1000,
              replace=TRUE)
))

(group_df <- group_by(exdf,
                     age,
                     location,
                     device) %>% summarise(
                       avg_date =mean(creation_date)
                       ) %>% ungroup %>% arrange(age,location,device) %>% 
    mutate(group_num= row_number()))

(df2 <-left_join(group_df,
                 exdf) %>% mutate(greater_than_average = creation_date > avg_date,
                                  final_group_code = paste0(group_num,str_sub(greater_than_average,start = 1,end=1))))
2 Likes

Hi,

Yes there was a typo in my previous reply, I corrected it.
But I understood your logic and it's working fine for me.

Thanks a lot !

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.