I have a very large dataset, which looks like this.
I have two types of data frames
- my reference data.frame
and my experimental data.frame
I want to match the ref and expr data.frames and find the levenstein distance between them. The output could look like this...
ref expr distance cake cak 1 cake cakee 1 cake cake 0 cake rownies ...
after I have measured their levenstein distance I want to cluster any string that has distance less than 3 to one cluster and my data to maybe look like
ref expr distance cluster cake cak 1 1 cake cakee 1 1 cake cake 0 1 brownies rownies 1 2 brownies browwnies 1 2
any help or advice on how to move on is appreciate it. At the moment I am trying a lot
of R packages to find the distance between data.frame such as
but they do not seem to work well.