Duplicates with a flag for all duplicated rows

Slavek · September 7, 2022, 8:47am

Hi, I have this simple df:

source.data <- 
data.frame(
  stringsAsFactors = FALSE,
               VRN = c("A8PWA","A8WTW","A9CGB",
                       "A9CGB","AJ18SHZ","AJ18VJU","AJ18VJU","AJ18VJU"),
  NPSResponseDate = c("2021-09-28 14:21:05",
                       "2021-10-14 13:04:01","2021-10-29 15:44:45",
                       "2021-12-15 11:21:59","2022-05-09 13:01:34","2022-04-21 12:50:06",
                       "2022-05-27 20:52:05","2022-07-15 23:34:26")
)

where I managed to find duplicates and check sequence of of these duplicates.

library(tidyverse)
library(lubridate)
result <- source.data %>% 
  group_by(VRN) %>% 
  arrange(NPSResponseDate)%>% 
  mutate(Sequence = row_number()) %>% 
  ungroup() %>% 
  arrange(VRN, NPSResponseDate)

result$Duplicated.VRN <- duplicated(result$VRN)

result

I need something different to this standard. I would like to flag all VRSs which are duplicated so duplicate flags should be next to row 3 and 4 and then 6-8. How can I do that?

ismirsehregal · September 7, 2022, 8:55am

We can use the fromLast parameter:

library(dplyr)
library(lubridate)

result <- source.data %>% 
  group_by(VRN) %>% 
  arrange(NPSResponseDate)%>% 
  mutate(Sequence = row_number()) %>% 
  ungroup() %>% 
  arrange(VRN, NPSResponseDate)

result$Duplicated.VRN <- duplicated(source.data$VRN) | duplicated(source.data$VRN, fromLast = TRUE)

result

nirgrahamuk · September 7, 2022, 9:08am

you could solve this staying within tidyverse, extending the mutate you have :

result <- source.data %>% 
  group_by(VRN) %>% 
  arrange(NPSResponseDate)%>% 
  mutate(Sequence = row_number(),
         Duplicated.VRN=n()>1) %>% 
  ungroup() %>% 
  arrange(VRN, NPSResponseDate)

system · September 14, 2022, 9:09am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.