Need help to handle duplicates

Hi All,

I am new to r studio and mostly using it to harmonized data using data frames or data tables.

I am in trouble handling duplicates in data frame.
Here is what inside data frame.

ID Value
1 109392p
2 208765ap
2 1208828z
1 zya10975p
1 2010789t

What I need is below
ID Value ID_new
1 109392p 1_1
2 208765ap 2_1
2 1208828z 2_2
1 zya10975p 1_2
1 2010789t 1_3

To simplify, I just need a unique key based on number of time my unique I'd occurs.
So in this example 1 is coming 3 times so new id will be 1_1,1_2,1_3 and same for second I'd

I tried logic using for loop it worked well but when I ran it for 1 million records it took more than hour and loop still running.

If anyone come up with a solution without loop that will help me to handle my problem is much appreciated.

Thanks
Anshu

Here is one solution.

library(dplyr)

DF <- data.frame(ID = c(1,2,2,1,1), Value = c("Q", "W", "A", "D", "E"))
DF
#>   ID Value
#> 1  1     Q
#> 2  2     W
#> 3  2     A
#> 4  1     D
#> 5  1     E
DF <- DF %>% group_by(ID) %>% mutate(ID_New = paste(ID, row_number(ID), sep = "_"))
DF
#> # A tibble: 5 x 3
#> # Groups:   ID [2]
#>      ID Value ID_New
#>   <dbl> <fct> <chr> 
#> 1     1 Q     1_1   
#> 2     2 W     2_1   
#> 3     2 A     2_2   
#> 4     1 D     1_2   
#> 5     1 E     1_3

Created on 2019-08-29 by the reprex package (v0.2.1)

1 Like

Bravo. That is what I was looking for.
a big thank you for you. Much appreciated :innocent:

Thanks,
Anshu

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.