Hi there,
I am trying identify in the DF1 what are the candidates of the DF2.
In addition it would be great to know more solutions, not only stringdist that I could apply here.
The error that I can see is the following: cannot allocate vector of size 1.6 Gb (using a 1 dataframe of 3000 observations and another one of 163000)
Is there any other technique which can asume this kind of practices?
``` r
df1<- data.frame(Rerefence = c("SHMBF7", "2257211011", "22572110112020" ,"22572110112021"), ID = 1:4)
df2<-data.frame (Reference = c ( "SHMBF", "SHMB", "84413844", "22572110112019", "22572110112020" ,"22572110112021"), ID= 1:6)
df1 %>%
stringdist_inner_join(df2, by = c(Reference = "Reference"), max_dist = 10)
Created on 2020-10-02 by the reprex package (v0.3.0)