Hi David, welcome!
You can use the
tidystringdist package, take a look at this similar thread.
This would be one way to do it
# Sample data
df <- data.frame(stringsAsFactors = FALSE,
Name = as.factor(c(" CANON PVT. LTD ", " Antila,Thomas ", " Greg ",
" St.Luke's Hospital ", " Z_SANDSTONE COOLING LTD ",
" St.Luke's Hospital ", " CANON PVT. LTD. ",
" SANDSTONE COOLING LTD ", " Greg ", " ANTILA,THOMAS ")),
City = as.factor(c(" Georgia ", " Georgia ", " Georgia ", " Georgia ",
" Georgia ", " Georgia ", " Georgia ", " Georgia ",
" Georgia ", " Georgia "))
match <- df %>%
If you need more specific help please provide a minimal
. A reprex makes it much easier for others to understand your issue and figure out how to help. REPRoducible EXample (reprex)
If you've never heard of a reprex before, you might want to start by reading this FAQ:
A minimal reproducible example consists of the following items:
A minimal dataset, necessary to reproduce the issue
The minimal runnable code necessary to reproduce the issue, which can be run
on the given dataset, and including the necessary information on the used packages.
Let's quickly go over each one of these with examples:
Minimal Dataset (Sample Data)
You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue.
Let's say, as an example, that you are working with the iris data frame
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2…