Hi there,
I am trying identify in the DF1 what are the candidates of the DF2.
In addition it would be great to know more solutions, not only stringdist that I could apply here.
The error that I can see is the following: cannot allocate vector of size 1.6 Gb (using a 1 dataframe of 3000 observations and another one of 163000)
Is there any other technique which can asume this kind of practices?
``` r
df1<- data.frame(Rerefence = c("SHMBF7", "2257211011", "22572110112020" ,"22572110112021"), ID = 1:4)
df2<-data.frame (Reference = c ( "SHMBF", "SHMB", "84413844", "22572110112019", "22572110112020" ,"22572110112021"), ID= 1:6)
df1 %>%
stringdist_inner_join(df2, by = c(Reference = "Reference"), max_dist = 10)
The error that I can see is the following: cannot allocate vector of size 1.6 Gb (using a 1 dataframe of 3000 observations and another one of 163000)
Is there any other technique which can asume this kind of practices?
keep your enviroment clean of other large files (only have the minimum loaded in memory that you need).
And I suppose perform the calculation in chunks rather than in one go (only use a portion of df2 at a time)