Another option is using tidystringdist , take a look to this related thread to see an example.
I have a table which contains name of vendors along with their other details such as address, telephone no etc. I need to identify the name of vendors who are similar to each other. I was successful in finding exact duplicate vendors, but it becomes difficult with fuzzy duplicates. Here is just a sample data set:
| Name | City |
|-------------------------|:-------:|
| CANON PVT. LTD | Georgia |
| Antila,Thomas | Georgia |
| Greg | Georgia |
| St.Luke's Hospital | Georgia |
| Z_SANDSTONE COOLING LTD | Georgia |
| St.Luke's Hospital | Georgia |
| CANON PVT. LTD. | Georgia |
| SANDSTONE COOLING LTD | Georgia |
| Gr…
If you need more specific help please provide a minimal REPR oducible EX ample (reprex) illustrating your issue. A reprex makes it much easier for others to understand your issue and figure out how to help.
If you've never heard of a reprex before, you might want to start by reading this FAQ:
A minimal reproducible example consists of the following items:
A minimal dataset, necessary to reproduce the issue
The minimal runnable code necessary to reproduce the issue, which can be run
on the given dataset, and including the necessary information on the used packages.
Let's quickly go over each one of these with examples:
Minimal Dataset (Sample Data)
You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue.
Let's say, as an example, that you are working with the iris data frame
head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.…
1 Like