Plumber API Curl Error 52: * Empty reply from server

fredoxvii · January 21, 2021, 4:15pm

Hello,

I have built a forecast with fable for some metrics at my work. The current run time for one metric is 12 hrs on one core or single threaded. I created a function that leverages multidplyr and that has reduced the run time for the forecasts to about 4-5hrs on 4 cores on my laptop. With 5 metrics, the run time reaches 20hrs and we will eventually be forecasting about 20 metrics. This is unsustainable to run on my laptop.

My solution was to build a plumber API on our internal Kubernetes infrastructure. This entailed using the docker image rocker/tidyverse:3.6.3 on Ubuntu. I have built 3 R packages that make it easy to port the functions I need into the container from our internal GitHub. I have several endpoints, 1 one the forecast and the rest for testing. The linux container has 32 cores and 93GB's of RAM. One of the main reasons that I went this route is because the other internal solutions for running R code are running R3.4 or don't have the architecture that allows the process to finish.

The API does what it is expected to do...it runs and pulls data from Postgres and Hive, builds the forecast models, builds the forecast, and writes back the forecasts to Hive in roughly 2hrs. This API is called using curl from our in house automation solution. The url is https, but the underlying infrastructure is http.

curl -X GET "https://url/endpoint" \
-H "Content-Type: application/json" \
-d '{"run":"TRUE","run_wk_beg_d":"<PARM="PERIOD_END"/>","vd":"<PARM="vd"/>","schema":"<PARM="schema"/>","tablename":"","append":"FALSE","cores":"28"}' \
-v -L -m 21600 --connect-timeout 21600 --keepalive-time 21600

This issue that I am having is that once the API call ends and the data is written to Hive I receive a curl response 52 error message: Empty reply from server. In other words, there is a response returned to the client but it is empty. Our automation system expects to receive a "0" response from the server, if it does not, then it labels the job as an ERROR even though it completed. It will also lock the job from running if it sustains to many errors.

I have tried the code below at the end of my endpoint function to force some kind of a response to be sent back to the client but that didn't work either.

  Sys.sleep(60)
  resp_1 <- paste0("Forecast upload complete for volume driver: ",vd)
  resp_2 <- paste0("Forecast table number of rows: ", nrow(upload_str_fx))
  res$setHeader("Content-Type: application/json","0")
  res$body <- list(Results = resp_1,
                   df_rows = resp_2
  )
  res$toResponse()

I am hoping there is some plumber code or options that could help me with this issue, force a reply when the process to the API ends. I am new to using curl so I am thinking there might also be a curl option that could help but I am not sure. I have looked every where online for a solution to this with no avail. "Your are my last hope, help me Obi Wan!"

Thanks!

barret · January 21, 2021, 5:03pm

Can you provide your whole test route definition?
I believe you should not call res$toResponse() and just return res.

to force some kind of a response to be sent back to the client

Once curl stops listening, plumber will not be able to push data back to curl.

I would discourage using VERY long curl requests. If possible, try to use a status based approach. Discussion: How to handle long polling process · Issue #497 · rstudio/plumber · GitHub

Can you determine when the response is being cut off? 5 minutes in? I'd double check that your machine host (such as Amazon) (or a service in the middle, ex: your SSL service) isn't cutting off the communication.

httpuv (the web engine in plumber) will not close the connection.

fredoxvii · January 21, 2021, 6:17pm

what do you mean by test route definition? I this to run the api where the rest_controller is the plumber.R in most documentation:

plumber::pr("rest_controller.R") %>% plumber::pr_run(port=80, host="0.0.0.0")

The endpoint has the following header:

#* @param run:string Flag to run the API
#* @param run_wk_beg_d:string Date in the week running the forecast.
#* @param vd:string Name of Volume Driver
#* @param tablename:string Available for Dev builds of the data in Hive.
#* @param append:string Where to append to the current table, not going to used though.
#* @param cores:int Cores to run in parallize cluster.
#* @response 200 Forecast has been uploaded to Hive.
#* @serializer json
#* @get /fx
#* @post /fx
function(run="FALSE", 
         run_wk_beg_d="FALSE",
         vd="FALSE",
         schema="FALSE",
         tablename="FALSE",
         append="FALSE",
         cores=1,
         res) {

...R code for forecast and ssh into hive...

  message("Forecast upload complete!")
  message("Forecast table number of rows:", nrow(upload_str_fx))
  
  Sys.sleep(60)
  resp_1 <- paste0("Forecast upload complete for volume driver: ",vd)
  resp_2 <- paste0("Forecast table number of rows: ", nrow(upload_str_fx))
  res$setHeader("Content-Type: application/json","0")
  res$body <- list(Results = resp_1,
                   df_rows = resp_2
  )
  res$toResponse()

}

fredoxvii · January 21, 2021, 6:30pm

To answer your second question, the logs from our automation portal gives me the following details:

The output pattern: During the 2 hour run the logs continue to receive lots of numbers from the server; I don't know what they are. They look like this:

2:06:24 --
:--:-- 
	0
100 125 0 0 0 
125 0 0 --:--:--
2:06:25 --:--:-

... continues this pattern for almost 2 hrs till the end.

Automation portal logs:
begin time: '2021-01-21 08:47:08'
curl error generated @: 2021-01-21 10:53:38

error: curl(52) * Empty reply from server

end time: 2021-01-21 10:53:41

Kibana logs: show the output from the run

January 21st 2021, 10:52:37.291 Forecast upload complete!
January 21st 2021, 10:52:37.291 Forecast table number of rows:760864
January 21st 2021, 10:52:32.560[1] "# Disconnect and rm password and csv file from edge node"

So right after the forecast is written to Hive with the SSH package, 10:52, it throughs the curl error, 10:53. So, curl is connected the entire time. You can see the output matches the code above. When the API sends a response, the response is empty.

meztez · January 21, 2021, 7:49pm

Maybe return just 0, and use headers for your messages?

Does your automation system expect 0 in a json format?

fredoxvii:

#* @param run:string Flag to run the API
#* @param run_wk_beg_d:string Date in the week running the forecast.
#* @param vd:string Name of Volume Driver
#* @param tablename:string Available for Dev builds of the data in Hive.
#* @param append:string Where to append to the current table, not going to used though.
#* @param cores:int Cores to run in parallize cluster.
#* @response 200 Forecast has been uploaded to Hive.
#* @serializer null
#* @get /fx
#* @post /fx
function(run="FALSE", 
         run_wk_beg_d="FALSE",
         vd="FALSE",
         schema="FALSE",
         tablename="FALSE",
         append="FALSE",
         cores=1,
         res) {

...R code for forecast and ssh into hive...

  message("Forecast upload complete!")
  message("Forecast table number of rows:", nrow(upload_str_fx))
  
  Sys.sleep(60)
  resp_1 <- paste0("Forecast upload complete for volume driver: ",vd)
  resp_2 <- paste0("Forecast table number of rows: ", )
  res$setHeader("Volume-Driver", vd)
  res$setHeader("Rows-Count", nrow(upload_str_fx))
  return(0)
}

??

fredoxvii · January 21, 2021, 8:03pm

Interesting idea. Let me try that as rebuilding the cluster won't take long.

fredoxvii · January 21, 2021, 8:47pm

@meztez That didn't work. I am going to try to return res to see if that works, but I thought that didn't work before. Let see.

Actually, before testing res, I decided to run the swagger version of the API on a test run. I got the result below, which I don't see in the logs of the automation portal or kibana. The test run built the data in hive as expected.

503	Error: Service Unavailable

response headers:

content-encoding: gzip  
content-length: 20 
content-type: text/plain;charset=utf-8  
date: Thu21 Jan 2021 20:55:44 GMT

meztez · January 21, 2021, 9:07pm

503 is a timeout error. So your API takes more than the allowed timeout to return. Like @barret said, I would advise against long running API calls has multiple middleman could drop the connection.

meztez · January 21, 2021, 9:13pm

Do you have an nginx or apache server in front of plumber that could be producing the timeout?

fredoxvii · January 21, 2021, 9:16pm

I don't know how the actual kubernetes architecture is put together. What I know is that with a docker image in GitHub, and a Vela yml file, I can have a docker container built on linux with a working URL.

fredoxvii · January 21, 2021, 9:19pm

I have read that it is not recommended to run an API that takes a long time. I do not currently have choice if I want to get this off of my laptop and into production. I can't run my current code on the other architecture platforms at work as they don't have the R/cores/RAM that I need to run the process, thus the docker container.

meztez · January 21, 2021, 9:20pm

Which docker image and what Vela.yml file?

fredoxvii · January 21, 2021, 9:24pm

Unfortunately, I can't share those, but I wish that I could to get this issue figured out. The docker image is based on rocker/tidyverse:3.6.3, but I have added to it to meet different requirements for my project and out architecture at work.

fredoxvii · January 21, 2021, 9:42pm

@barret Hi! So I tried having just res returned at the end of the script but I get the same error.

I read through the GitHub issue that you posted above, but I can't make heads or tales of it just yet. I have to read it a few more times to better understand what is going on.

Any other ideas? Thanks!

meztez · January 21, 2021, 10:03pm

@fredoxvii make sure to toggle the arrow next to tokic.R to see the code.

Your long running code should get inside a later block or something similar, it is worth the time to investigate.

fredoxvii · January 21, 2021, 11:36pm

Yes, that was the part that I needed to get my head around. This is not an easy issue to resolve, but I feel like it is going to become more common with long running processes and being able to call them from an API.

Thanks everyone!

system · February 11, 2021, 11:36pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.