Asynchronous API calls with curl


Hi all!

We're working on building up our ETL jobs on R Studio Connect and we some long running jobs that I would like to speed up by making APIs calls parallel and not sequential as if I do that with purrr iterating through a list or a simple for loop.

In my search for a good solution I came across this curl vignette and specifically the 'async requests' part of it. I though: "awesome, somebody built it for me I just need to adjust it to me needs"... I guess I couldn't be more wrong :stuck_out_tongue:

I started searching around and found this implementation link, but the problem is that it's not directly applicable to my needs + I need to do a POST. Below I'll list shortly what I was trying to implement unsuccessfully so far:

A single POST handler - theoretically the body will need to change with each list but for now let's not go that far. Adjusting that to a POST was not an issue, the question for me is more how to feed that handler correctly into the remaining pipeline.

h <- new_handle(
  copypostfields = toJSON(body, pretty = TRUE, auto_unbox = TRUE)
  ) %>%
    `Authorization` = "XXX",
    `Content-Type` = "application/json"

Let's say below that I would like to make 3 asynchronous calls using that same handler (or let's say handler versions, but maybe let's get there in a minute) against the same server, hence, I repeat the API host 3 times.

pool <- new_pool()

# Results only available through call back function
cb <- function(req){cat("done:", req$url, ": HTTPS:", req$status, "\n", "content:", rawToChar(req$content), "\n")}

# Example vector of uris to loop through
uris <- c(
sapply(uris, curl_fetch_multi, done = cb, pool = pool)
out <- multi_run(pool = pool)

After those lines the execution should take place, but instead of a great result I get two types of errors below:

  1. Either just 404 because that handler I defined above is not tied to any of those calls (it's just a generic curl GET call)

  2. If I change that last in order to tie the handler into:

sapply(uris, curl_fetch_multi, done = cb, pool = pool, handle = h)
out <- multi_run(pool = pool)
Error in multi_add(handle = handle, done = done, fail = fail, data = data,  : 
  Handle is locked. Probably in use in a connection or async request.

So it even says in the documentation that a handler can't be used more than once, but then I having trouble understanding how to organise this pipeline of asynchronous calls the right way. Did anyone came across a similar issue and found a viable solution?





For async request, you can take a look at crul :package: too



Oh, wow - I wasn't aware of that and I need to admit it's absolutely fantastic stuff!

I eventually managed to create a solution where I'm calling the same API on R Studio Connect, where I also make sure that a big number of processes can be spawned quickly to serve my requests, with a list of varying bodies. The idea is that I would like to process hundreds of client jsons against the same API as quickly as possible.

The full mini-solution is below and I hope somebody will find it useful!


bdy <- read_json("body.json")

url <- "https://api-endpoint/calculate"
hdr <- list(
  `Authorization` = "XXX",
  `Content-Type` = "application/json"

# Pretending I have a list of 5 different bodies to post
bdy_list <- rep(toJSON(bdy, pretty = TRUE, auto_unbox = TRUE), 5)

# Creating a list of individual requests with varying bodies
req <- vector("list", length(bdy))
for(i in seq_along(bdy)){
  req[[i]] <- HttpRequest$new(url = url, headers = hdr)$post(body = bdy_list[[i]])

# Execution
res <- AsyncVaried$new(.list = req)

Another solution that also worked for us very well was just iterating through the list of bodies and a single POST call wrapped in a function with furrr - probably even better cause we could keep our results tidy in a df and work of them afterwards.