Fuzzy Filtering in dplyr? Using agrep within filter?


Anybody have any good ways to filter for mostly similar values?

One thing I’m trying is agrep within filter(), but I need to loop through the agrep somehow, as I get this warnng:

...argument 'pattern' has length > 1 and only the first element will be used

Here is an example of my code


Artist <- c("Eminem", "Spiritualized")
Album <- c("Revival", "Pure Phase")
mydata <- data_frame(Artist, Album)

get_album_data <- function(x) {
  get_artist_audio_features(mydata$Artist[x]) %>%
    filter(agrepl(album_name, mydata$Album[x]) == TRUE)}

try_get_album_data <- function(x) {
  tryCatch(get_album_data(x), error = function(e) {data.frame()})}

map_df(seq(1,2), try_get_album_data)

Any ideas? Any suggestions are appreciated


The error in your case is because agrepl takes the pattern first and the vector second, so agrepl(album_name, mydata$Album[x]). (As a side note, you don’t need == TRUE in the expression).

You also may like to check out my fuzzyjoin package, particularly stringdist_inner_join, which can join two data frames based on inexact string matching of columns.


Oh my, that was so much simpler than I expected. You wouldn’t believe how many hours I spent thinking through this problem. Thanks a lot!