mutate new dataframe id duplicate with old dataframe

I have two data frame, this is just a sample , database have approx 1 million of records.

data1<-data.frame(
  'External ID' = c(86364,"ARV_2612","AGH_2212","IND_2622","CHG_2622"),
  sector = c(3,3,1,2,5),
  col1=c(1,1,0,0,0),
  'Enternal code'=c(1,1,1,1,3),
  col3=c(1,1,0,0,0),
  col4=c(1,0,0,0,0),
  col5=c(1,0,1,1,1)
)

data2<-data.frame(
  'External ID' = c(53265,"ARV_7362",76354,"IND_2622","CHG_9762"),
  sector = c(3,3,1,2,5),
  col1=c(1,1,0,0,0),
  'Enternal code'=c(1,1,1,1,3),
  col3=c(1,1,0,0,0),
  col4=c(1,0,0,0,0),
  col5=c(1,0,1,1,1)
)

new to R,now i am looking for a approach to mutate my one data frame (data2) like. the code while will automatically find the column "External ID", then add a new column in data2 like duplicate and this column will show Y or N if the external_id in data2 present in data1(external ID) any simple solution....?? the output should be like

External.ID sector col1 Enternal.code col3 col4 col5 duplicate
53265 3 1 1 1 1 1 N
ARV_7362 3 1 1 1 0 0 N
76354 1 0 1 0 0 1 N
IND_2622 2 0 1 0 0 1 Y
CHG_9762 5 0 3 0 0 1 N
data2 %>% mutate(
  duplicated = factor(if_else(External.ID %in% 
                                pull(data1, External.ID),
                              "Yes", 
                              "No")))

can we pull column External.ID like select(matches("^External|ID") because i want to make a function to this , so that it can dynamically select that External_ID if matches column in data1

you are saying that there can be deviation in how external ID column is named, so you want regex matching between columns of dataframes where external and ID appear as tokens in a column.
I'd suggest its more elegant to do a function that is limited to identifying the externaID columns, and then use the result from calling that function to then do the duplicated check using the approach I suggested.

yes ....
I'd suggest its more elegant to do a function that is limited to identifying the externaID columns .....??
how ....??

data1<-data.frame(
  'ExternalID' = c(86364,"ARV_2612","AGH_2212","IND_2622","CHG_2622"),
  sector = c(3,3,1,2,5)
)

data2<-data.frame(
  'External ID' = c(53265,"ARV_7362",76354,"IND_2622","CHG_9762"),
  sector = c(3,3,1,2,5)
)


getexidnames<- function(data1,data2){
(d1name <- names(data1) %>% stringr::str_subset("^External|ID"))
(d2name <- names(data2) %>% stringr::str_subset("^External|ID"))
list(d1name=d1name,
     d2name=d2name)
}

getexidnames(data1,data2)

thanks for help, it worked for me

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.