HI,
I just want to know the interpretation of the stringdist function of stringdist package.
I am doing fuzzy string matching with stringdist package by taking 6 fruits name. please Find below file.
fruits.pdf (147.1 KB)
Now i have executed string dist function. The code is below,
library(stringdist)
x <- read.csv("fruits.csv")
df1 <- data.frame(seqid = seq(1:6), name = x)
df1
dfr <- data.frame(n1=df1$name,n2=df1$name)
dfr
ndf <- expand.grid(lapply(dfr, levels))
ndf
View(ndf)
ndf <- ndf[order(ndf$n1),]
ndf
View(ndf)
method_list <- c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex")
for( i in method_list)
{
ndf[,i] <- stringdist(ndf$n1,ndf$n2,method=i)
}
suspicious_match <- ndf[ndf$cosine < 0.20 & ndf$cosine != 0 & ndf$qgram < 10, ]
suspicious_match <- suspicious_match[order(suspicious_match$n1,suspicious_match$cosine),]
View(suspicious_match)
The code has been executed successfully,there is no error.But, i am getting a little difficulties to understand the interpretation of the output.
for example,
Grapes green seedless and grapes seedless red are the different fruit name but the soundex method is showing they are same. But, what other methods (osa,lv,dl,hamming etc) are saying ?
I have googled those method, but, did not understand the real interpretation of those method.
can you just tell me what is the interpretation of these methods, so that i can identify above two fruits are different... not same.
any suggestions in this case are really appreciable.
Thanks,
snandy