Join using stringdist or other techniques

Hi there,
I am trying identify in the DF1 what are the candidates of the DF2.
In addition it would be great to know more solutions, not only stringdist that I could apply here.

The error that I can see is the following: cannot allocate vector of size 1.6 Gb (using a 1 dataframe of 3000 observations and another one of 163000)
Is there any other technique which can asume this kind of practices?

``` r
df1<- data.frame(Rerefence = c("SHMBF7", "2257211011", "22572110112020" ,"22572110112021"), ID = 1:4)
               

df2<-data.frame (Reference = c ( "SHMBF", "SHMB", "84413844", "22572110112019", "22572110112020" ,"22572110112021"), ID= 1:6)



df1 %>%
  stringdist_inner_join(df2, by = c(Reference = "Reference"), max_dist = 10)

Created on 2020-10-02 by the reprex package (v0.3.0)

You should load either dplyr or magrittr (or tidyverse) packages in order to access the pipe function %>%

I have installed tidyverse. I donĀ“t know why is not going. Can you please try with my reprex? thanks

You have to load packages to use them.
library (tidyverse)

I saw the mistake... the name of the variable of DF1 were not properly typed....
Thanks

The error that I can see is the following: cannot allocate vector of size 1.6 Gb (using a 1 dataframe of 3000 observations and another one of 163000)
Is there any other technique which can asume this kind of practices?

keep your enviroment clean of other large files (only have the minimum loaded in memory that you need).
And I suppose perform the calculation in chunks rather than in one go (only use a portion of df2 at a time)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.