Map() systematically skipping items in data frame despite measures taken to avoid that

purrr

#1

Using spotifyr, I'm trying to get Spotify audio data for thousands of albums. However, there are some albums that my purrr:map_df() seems to always skip over, despite the function working when I call the album individually.

Things I've tried that don't seem to be solving the problem:

  1. Sys.sleep() for pausing between scrapes
  2. warrenr::persistently() to retry an album if the scrape fails
  3. Splitting my data frame into 5 nearly even parts and using map_df() over each of them

I put an example of my code below. Bear in mind that this may not be entirely reproducible because my data frame is over 1400 observations. Any suggestions are greatly appreciated!

library(spotifyr)
library(tidyverse)
library(stringdist)
library(warrenr) #devtools::install_github("ijlyttle/warrenr") if not installed

Sys.setenv(SPOTIFY_CLIENT_ID = "xxx") # from Spotify' API page
Sys.setenv(SPOTIFY_CLIENT_SECRET = "xxx") # from Spotify's API page

access_token <- get_spotify_access_token()

Artist <- c("Spiritualized", "Fleet Foxes", "Gucci Mane", "Gucci Mane", "Iron Maiden", "Ween")
Album <- c("Sweet Heart, Sweet Light", "Helplessness Blues", "Mr. Davis", "Everybody Looking", "The Final Frontier", "Quebec")
my_data <- data_frame(Artist, Album)

# these specific Fleet Foxes, Gucci Mane, and Iron Maiden albums tend to be skipped over when scraped in bulk
my_data
#> # A tibble: 6 x 2
#>   Artist        Album                   
#>   <chr>         <chr>                   
#> 1 Spiritualized Sweet Heart, Sweet Light
#> 2 Fleet Foxes   Helplessness Blues      
#> 3 Gucci Mane    Mr. Davis               
#> 4 Gucci Mane    Everybody Looking       
#> 5 Iron Maiden   The Final Frontier      
#> 6 Ween          Quebec

closest_match <- function(string, string_vector){
  string_vector[amatch(tolower(string), 
                       tolower(string_vector), 
                       maxDist = 6, 
                       method = "lv", 
                       weight = c(d = 1, i = 0.1, s = 1))]
}

# sets up progress bar
pb_1 <- my_data %>%
  tally() %>%
  progress_estimated(min_time = 0)

# gets album data by data frame row number
get_album_data <- function(row_num) {
  
  pb_1$tick()$print()
  
  seq(3, 5, by = 0.001) %>%
    sample(1) %>%
    Sys.sleep()
  
  get_artist_audio_features(my_data$Artist[row_num], return_closest_artist = TRUE) %>% 
    filter(album_name == closest_match(my_data$Album[row_num], album_name))
}

# retries scraping up to 15 times if it doesn't work
persistently_get_album_data <- warrenr::persistently(get_album_data, max_attempts = 15, wait_seconds = 0)

# if scraping doesn't work, moves on
try_get_album_data <- function(row_num) {
  tryCatch(persistently_get_album_data(row_num), error = function(e) {data.frame()})
}

# typically, with ~1400 observations will have skipped values for Gucci Mane, Fleet Foxes, Iron Maiden albums listed above
audio_data <- 1:nrow(my_data) %>% map_df(try_get_album_data)

# probably will work
trick_audio_data <- 3 %>% map_df(try_get_album_data)

Created on 2018-04-10 by the reprex package (v0.2.0).