When I do webscraping or many API calls I often get throttled at a certain point. This doesn't break my code because I wrap my functions with purrr::safely
, but I'll still need to eventually go back and retry these missing entries. My question is: what's the most efficient way to go back and retry these NULLs?
For example suppose I tried
library(tidyverse)
library(geonames)
options(geonamesUsername="XXXX")
options(geonamesHost="api.geonames.org")
find_zip_code <- safely(GNfindNearbyPostalCodes)
zipcodes <- tibble(
locationlatitude = c(
43.142, 45.015,
34.296, 40.714, 40.661
),
locationlongitude = c(
-85.049, -93.340,
-80.113, -75.032, -74.012
),
zipcode = list("29079", "55422", "48834", NULL, NULL)
)
### how can I selectevly retry the NULLS?
## my usual method would be to filter for NULLs and join later
## 1) my normal method of mapping works
zipcodes %>%
mutate(
zip = map2(locationlatitude,
locationlongitude,
~find_zip_code(
lat = .x,
lng = .y,
maxRows = 1))
)
## 2) I could try some map_if method to target NULLs but this fails
retry_df2 <- zipcodes %>%
nest(coord = c(locationlatitude, locationlongitude)) %>%
mutate(
zip = map_if(coord,
.p = is.null(zipcode),
.f =
~find_zip_code(
lat = .x$locationlatitude,
lng = .x$locationlongitude,
maxRows = 1), .else = list(F)
)
)
#> Error: Problem with `mutate()` input `zip`.
#> x length(.p) == length(.x) is not TRUE
#> ℹ Input `zip` is `map_if(...)`.
Created on 2020-11-22 by the reprex package (v0.3.0)
It is easy enough to filter, and then rejoin but this is clumsy, and I am wondering if there is a more elegant solution.