When I do webscraping or many API calls I often get throttled at a certain point. This doesn't break my code because I wrap my functions with
purrr::safely, but I'll still need to eventually go back and retry these missing entries. My question is: what's the most efficient way to go back and retry these NULLs?
For example suppose I tried
library(tidyverse) library(geonames) options(geonamesUsername="XXXX") options(geonamesHost="api.geonames.org") find_zip_code <- safely(GNfindNearbyPostalCodes) zipcodes <- tibble( locationlatitude = c( 43.142, 45.015, 34.296, 40.714, 40.661 ), locationlongitude = c( -85.049, -93.340, -80.113, -75.032, -74.012 ), zipcode = list("29079", "55422", "48834", NULL, NULL) ) ### how can I selectevly retry the NULLS? ## my usual method would be to filter for NULLs and join later ## 1) my normal method of mapping works zipcodes %>% mutate( zip = map2(locationlatitude, locationlongitude, ~find_zip_code( lat = .x, lng = .y, maxRows = 1)) ) ## 2) I could try some map_if method to target NULLs but this fails retry_df2 <- zipcodes %>% nest(coord = c(locationlatitude, locationlongitude)) %>% mutate( zip = map_if(coord, .p = is.null(zipcode), .f = ~find_zip_code( lat = .x$locationlatitude, lng = .x$locationlongitude, maxRows = 1), .else = list(F) ) ) #> Error: Problem with `mutate()` input `zip`. #> x length(.p) == length(.x) is not TRUE #> ℹ Input `zip` is `map_if(...)`.
Created on 2020-11-22 by the reprex package (v0.3.0)
It is easy enough to filter, and then rejoin but this is clumsy, and I am wondering if there is a more elegant solution.