Hello,
I have built a forecast with fable
for some metrics at my work. The current run time for one metric is 12 hrs on one core or single threaded. I created a function that leverages multidplyr
and that has reduced the run time for the forecasts to about 4-5hrs on 4 cores on my laptop. With 5 metrics, the run time reaches 20hrs and we will eventually be forecasting about 20 metrics. This is unsustainable to run on my laptop.
My solution was to build a plumber API on our internal Kubernetes infrastructure. This entailed using the docker image rocker/tidyverse:3.6.3 on Ubuntu. I have built 3 R packages that make it easy to port the functions I need into the container from our internal GitHub. I have several endpoints, 1 one the forecast and the rest for testing. The linux container has 32 cores and 93GB's of RAM. One of the main reasons that I went this route is because the other internal solutions for running R code are running R3.4 or don't have the architecture that allows the process to finish.
The API does what it is expected to do...it runs and pulls data from Postgres and Hive, builds the forecast models, builds the forecast, and writes back the forecasts to Hive in roughly 2hrs. This API is called using curl from our in house automation solution. The url is https, but the underlying infrastructure is http.
curl -X GET "https://url/endpoint" \
-H "Content-Type: application/json" \
-d '{"run":"TRUE","run_wk_beg_d":"<PARM="PERIOD_END"/>","vd":"<PARM="vd"/>","schema":"<PARM="schema"/>","tablename":"","append":"FALSE","cores":"28"}' \
-v -L -m 21600 --connect-timeout 21600 --keepalive-time 21600
This issue that I am having is that once the API call ends and the data is written to Hive I receive a curl response 52 error message: Empty reply from server
. In other words, there is a response returned to the client but it is empty. Our automation system expects to receive a "0" response from the server, if it does not, then it labels the job as an ERROR
even though it completed. It will also lock the job from running if it sustains to many errors.
I have tried the code below at the end of my endpoint function to force some kind of a response to be sent back to the client but that didn't work either.
Sys.sleep(60)
resp_1 <- paste0("Forecast upload complete for volume driver: ",vd)
resp_2 <- paste0("Forecast table number of rows: ", nrow(upload_str_fx))
res$setHeader("Content-Type: application/json","0")
res$body <- list(Results = resp_1,
df_rows = resp_2
)
res$toResponse()
I am hoping there is some plumber code or options that could help me with this issue, force a reply when the process to the API ends. I am new to using curl so I am thinking there might also be a curl option that could help but I am not sure. I have looked every where online for a solution to this with no avail. "Your are my last hope, help me Obi Wan!"
Thanks!