I have the following dataset in R:
address = c( "44 Ocean Road Atlanta Georgia", "882 4N Road River NY, NY 12345", "882 - River Road NY, ZIP 12345", "123 Fake Road Boston Drive Boston", "123 Fake - Rd Boston 56789", "3665 Apt 5 Moon Crs", "3665 Unit Moon Crescent", "NO ADDRESS PROVIDED", "31 Silver Way Road", "1800 Orleans St, Baltimore, MD 21287, United States",
"1799 Orlans Street, Maryland , USA")
name = c("Pancake House of America" ,"ABC Center Building", "Cent. Bldg ABC", "BD Home 25 New", "Boarding Direct 25", "Pine Recreational Center", "Pine Rec. cntR", "Boston Swimming Complex", "boston gym center", "mas hospital" , "Massachusetts Hospital" )
blocking_var = c(1, 1,1,1, 1, 2,2,2,2,3,3)
my_data = data.frame(address, name, blocking_var)
The data looks something like this:
> my_data
address name blocking_var
1 44 Ocean Road Atlanta Georgia Pancake House of America 1
2 882 4N Road River NY, NY 12345 ABC Center Building 1
3 882 - River Road NY, ZIP 12345 Cent. Bldg ABC 1
4 123 Fake Road Boston Drive Boston BD Home 25 New 1
5 123 Fake - Rd Boston 56789 Boarding Direct 25 1
6 3665 Apt 5 Moon Crs Pine Recreational Center 2
7 3665 Unit Moon Crescent Pine Rec. cntR 2
8 NO ADDRESS PROVIDED Boston Swimming Complex 2
9 31 Silver Way Road boston gym center 2
10 1800 Orleans St, Baltimore, MD 21287, United States mas hospital 3
11 1799 Orlans Street, Maryland , USA Massachusetts Hospital 3
I am trying to follow this R tutorial (https://cran.r-project.org/web/packages/RecordLinkage/vignettes/WeightBased.pdf) and learn how to remove duplicates based on fuzzy conditions. The goal (within each "block") is to keep all unique records - and for fuzzy duplicates, only keep the first occurrence of the duplicate.
I tried the following code:
library(RecordLinkage)
pairs=compare.dedup(my_data, blockfld=3)
But when I inspect the results, everything is NA - given these results, I think I am doing something wrong and there does not seem to be any point in continuing until this error is resolved.
Can someone please show me how I can resolve this problem and continue on with the tutorial?
Thank you!