Filtering duplicate rows with conditions

realusername · May 16, 2019, 5:27pm

Let's say I have a dataset with duplicate rows like this:

Josh Green | 2010-2011
Josh Green | 2011-2012
Josh Green | 2012-2013
Sam White | 2011-2012
Sam White | 2013-2014
Sam White | 2016-2017
Paul Grays | 2010-2011

And I want to keep only unique names with the most recent date like this:

Josh Green | 2012-2013
Sam White | 2016-2017
Paul Grays | 2010-2011

Any way to accomplish that?

martin.R · May 16, 2019, 5:44pm

This is one way:

library(dplyr)

df %>% 
  group_by(name) %>% 
  arrange(desc(year), .by_group = TRUE)
  filter(row_number() == 1)

realusername · May 16, 2019, 5:51pm

Perfect, thank you so much.

Just in case anyone wants to use this, Martin forgot '%>%' so it's gonna look like this:

df %>% 
  group_by(name) %>% 
  arrange(desc(year), .by_group = TRUE) %>%
  filter(row_number() == 1)

system · May 23, 2019, 5:51pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.