Fuzzy match only if exact match doesn't exist


#1

I’m trying to write a function to get album data from Spotify’s API for a data frame of albums and artists. Because there are some misspellings in the dataset, I need to use a fuzzy matching function (like agrepl).

However, some artists, like Absu, have albums that are, by agrepl's standards, the same. For example, Absu has an album named “Absu” and another named “Abzu”. I only want the data for 1 of them, but I’ll end up with data for both. I know that you can change max.distance in agrepl, but I need it set fairly low to account for greater misspellings.

Is there a pre-built function or an easy way to tell R

if there is an exact match of album_name with mydata[["Album"]] filter and move on
else: try and find a close match to filter?

Here’s something I’ve tried, but doesn’t work:

get_album_data <- function(x) {
  
  get_artist_audio_features(mydata$Artist[x], return_closest_artist = TRUE) %>% 
    ifelse(album_name %in% mydata$Album[x],
           filter(mydata$Album[x] == album_name,
           filter(agrepl(mydata$Album[x], album_name, ignore.case = TRUE))))
  
}

This is what my code looks like without trying anything special

library(dplyr)
library(spotifyr)
library(purrr)

# from Spotify's developer page
Sys.setenv(SPOTIFY_CLIENT_ID = "xxx")
Sys.setenv(SPOTIFY_CLIENT_SECRET = "xxx")
access_token <- get_spotify_access_token()

Artist <- c("Spiritualized", "Fleet Foxes", "The Avalanches", "Absu")
Album <- c("Sweet Heart, Sweet Light", "Helplessness Blues", "Wildflower", "Abzu")

mydata <- data_frame(Artist, Album)

get_album_data <- function(x) {
  get_artist_audio_features(mydata[["Artist"]][x], return_closest_artist = TRUE) %>% 
    filter(agrepl(mydata[["Album"]][x], album_name, ignore.case = TRUE)) %>%
    mutate(mydata[["Artist"]][x])
}

Any ideas? Thanks

EDIT: I just asked the same question on StackOverflow, since I didn’t find an answer to the problem here.


#2

Not sure it will answer your problem but no that for fuzzy matching there is a :package: called

Currently, it is not available on CRAN anymore but it will become available soon.

There is several methods of matching that you can use. You could try to see if one matches your expectation