The problem is, a doctor who received payments from different companies must report where the money come from. I have a list of officially recorded companies and a list of company names that doctor reports. The goal of the code is to check if that the database's record is accurate(There is no specific sequence of the names in the column or in the list) But that doctor only write part of the company name, so I have to use the agrep function to obtain a list of Boolean value.
The actual list or column are much larger, and I just construct a simpler model in following codes. I have tried to vary the max.distance parameter. I found out when max.distance is 5 or larger, I will get 4 TRUE; otherwise I will get 4 False. I am not sure if my codes have logic problems or I didn't adjust the max.distance properly. Hoping for any suggestions
df <- data.frame(CompanyInDataBase = c('Pfizer Inc', 'Shire North America Group Inc', 'Roche Inc', 'Bayor Inc'),
stringsAsFactors = FALSE)
report = c('Shire', 'Pfizer', 'Genetech')
for(i in 1:length(report)){
match <- agrepl(report[i], df$CompanyInDataBase, max.distance = 0.1)
}
I expect the output of a list of correct Boolean value, the size of this list should be the same as CompanyInDataBase's.