Filtering duplicate rows with conditions

Let's say I have a dataset with duplicate rows like this:

Josh Green | 2010-2011
Josh Green | 2011-2012
Josh Green | 2012-2013
Sam White | 2011-2012
Sam White | 2013-2014
Sam White | 2016-2017
Paul Grays | 2010-2011

And I want to keep only unique names with the most recent date like this:

Josh Green | 2012-2013
Sam White | 2016-2017
Paul Grays | 2010-2011

Any way to accomplish that?

This is one way:

library(dplyr)

df %>% 
  group_by(name) %>% 
  arrange(desc(year), .by_group = TRUE)
  filter(row_number() == 1)
1 Like

Perfect, thank you so much.

Just in case anyone wants to use this, Martin forgot '%>%' so it's gonna look like this:

df %>% 
  group_by(name) %>% 
  arrange(desc(year), .by_group = TRUE) %>%
  filter(row_number() == 1)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.