Hi all!
We're working on building up our ETL jobs on R Studio Connect and we some long running jobs that I would like to speed up by making APIs calls parallel and not sequential as if I do that with purrr
iterating through a list or a simple for loop.
In my search for a good solution I came across this curl vignette and specifically the 'async requests' part of it. I though: "awesome, somebody built it for me I just need to adjust it to me needs"... I guess I couldn't be more wrong
I started searching around and found this implementation link, but the problem is that it's not directly applicable to my needs + I need to do a POST. Below I'll list shortly what I was trying to implement unsuccessfully so far:
A single POST handler - theoretically the body will need to change with each list but for now let's not go that far. Adjusting that to a POST was not an issue, the question for me is more how to feed that handler correctly into the remaining pipeline.
h <- new_handle(
copypostfields = toJSON(body, pretty = TRUE, auto_unbox = TRUE)
) %>%
handle_setheaders(
`Authorization` = "XXX",
`Content-Type` = "application/json"
)
Let's say below that I would like to make 3 asynchronous calls using that same handler (or let's say handler versions, but maybe let's get there in a minute) against the same server, hence, I repeat the API host 3 times.
pool <- new_pool()
# Results only available through call back function
cb <- function(req){cat("done:", req$url, ": HTTPS:", req$status, "\n", "content:", rawToChar(req$content), "\n")}
# Example vector of uris to loop through
uris <- c(
"https://api-link.com",
"https://api-link.com",
"https://api-link.com"
)
sapply(uris, curl_fetch_multi, done = cb, pool = pool)
out <- multi_run(pool = pool)
After those lines the execution should take place, but instead of a great result I get two types of errors below:
-
Either just 404 because that handler I defined above is not tied to any of those calls (it's just a generic curl GET call)
-
If I change that last in order to tie the handler into:
sapply(uris, curl_fetch_multi, done = cb, pool = pool, handle = h)
out <- multi_run(pool = pool)
Error in multi_add(handle = handle, done = done, fail = fail, data = data, :
Handle is locked. Probably in use in a connection or async request.
So it even says in the documentation that a handler can't be used more than once, but then I having trouble understanding how to organise this pipeline of asynchronous calls the right way. Did anyone came across a similar issue and found a viable solution?
Thanks!