Removing rows based on column conditions R-Studio with filter, dplyr

Suppose we have a data frame:

Event <- c("A", "A", "A", "B", "B", "C" , "C", "C")
Model <- c( 1, 2, 3, 1, 2, 1, 2, 3)

df <- data.frame(Event, Model)

print (df)

Which looks like this:

event Model
A 1
A 2
A 3
B 1
B 2
C 1
C 2
C 3

We can see that event B only has 2 models of data. As the actual data frame I am using has thousands of rows and 17 columns, how can I remove all events that do not have 3 models? My guess is to use a filter however I am not sure how to do it when we have more than one condition.

I tried the code below:

df %>% group_by(Event) %>% 
  filter(max(Model)==3) 

However, this would miss out entries in the data that looked like this.

event Model
A 1
A 3

Many thanks.

try this:

df %>% filter(Event %in% c(
    df %>% group_by(Event) %>% summarise(n=n_distinct(Model)) %>% 
      filter(n>=3) %>% pull(Event)
 ))

or you can do it separately

events.over.3 <- df %>% group_by(Event) %>% summarise(n=n_distinct(Model)) %>% 
  filter(n>=3) %>% pull(Event)

df %>% filter(Event %in% events.over.3) 
1 Like

Great, I managed to do it with this

Filtered_df <- df%>% group_by(Event) %>% filter(length(Model)>=3)

Be careful, this syntax may not solve the cases in which some duplicated observations exist.
for example, if there're 2 sets of Event = B and Model = 2

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.