I have two data frame, this is just a sample , database have approx 1 million of records.
can have name, email, alphanumeric code etc.
data1<-data.frame(
'ID 1' = c(86364,"ARV_2612","AGH_2212","IND_2622","CHG_2622"),
sector = c(3,3,1,2,5),
name=c("nhug","hugy","mjuk","ghtr","kuld"),
'Enternal code'=c(1,1,1,1,3),
col3=c(1,1,0,0,0),
col4=c(1,0,0,0,0),
col5=c(1,0,1,1,1)
)
data2<-data.frame(
'ID 1' = c(53265,"ARV_7362",76354,"IND_2622","CHG_9762"),
sector = c(3,3,1,2,5),
name=c("nhug","hugy","mjuk","ghtr","kuld"),
'Enternal code'=c(1,1,1,1,3),
col3=c(1,1,0,0,0),
col4=c(1,0,0,0,0),
col5=c(1,0,1,1,1)
)
data2 %>% mutate(
duplicated = factor(if_else(`ID 1` %in%
pull(data1, `ID 1`),
1,
0)))
new to r, now i am looking for a function to mutate my one data frame (data2) like. if I give column names of data1 and data2 to find if the values or string already exist in other data and mutate a new column to 1,0 for true and false.
the function would be like
func(data1 = "name",data2="name",mutated_com="name_exist")
the mutated data frame would be like
External.ID | sector | col1 | Enternal.code | col3 | col4 | col5 | duplicate |
---|---|---|---|---|---|---|---|
53265 | 3 | 1 | 1 | 1 | 1 | 1 | 0 |
ARV_7362 | 3 | 1 | 1 | 1 | 0 | 0 | 0 |
76354 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
IND_2622 | 2 | 0 | 1 | 0 | 0 | 1 | 1 |
CHG_9762 | 5 | 0 | 3 | 0 | 0 | 1 | 1 |