Response takes much longer than endpoint function to return result, serializer issue?

matmu · September 30, 2021, 4:59pm

I have created an endpoint with plumber (plumber_1.1.0) that generates and returns a data.frame with 400 rows and 35k columns of numeric values which corresponds to a tab-delimited file of 243MB. When I query this endpoint, e.g. on the commad line with wget and curl, it takes around 7 seconds until the download starts. However, the endpoint function itself is finished after around 1 second as measured with system.time. I am wondering where the 6 seconds come from. If it can be attributed to the serialization to tsv it feels like it is a rather bad implementation then.

plumber.R

#* @get /test
function( res){

  res$serializer = serializer_tsv()

  result = data.frame()
  time = system.time({
    result = data.frame(replicate(400, runif(35000, min=0, max=100)))
  })
  print(time)

  res = result
}

run-plumber.R

#!/usr/bin/env Rscript

library(plumber)
pr("test.R") %>% pr_run(port=4000, host="0.0.0.0")

meztez · September 30, 2021, 5:22pm

You can test the serialization time by itself.

github.com

rstudio/plumber/blob/06e46f3ff5119e5f1cb8af29ef49aecb3cbb932a/R/serializer.R#L205

    
      
          }
          
          
#' @describeIn serializers TSV serializer. See also: [readr::format_tsv()]
          #' @export
          serializer_tsv <- function(..., type = "text/tab-separated-values; charset=UTF-8") {
            if (!requireNamespace("readr", quietly = TRUE)) {
              stop("`readr` must be installed for `serializer_tsv` to work")
            }
          
          
  serializer_content_type(type, function(val) {
              readr::format_tsv(val, ...)
            })
          }
          
          

          

          
#' @describeIn serializers HTML serializer
          #' @export
          serializer_html <- function(type = "text/html; charset=UTF-8") {
            serializer_content_type(type)
          }

The implementation use readr

system.time({readr::format_tsv(res)})

You can change for your own implementation with something like

serializer_tsv <- function(..., type = "text/tab-separated-values; charset=UTF-8") {
  if (!requireNamespace("readr", quietly = TRUE)) {
    stop("`readr` must be installed for `serializer_tsv` to work")
  }

  serializer_content_type(type, function(val) {
    readr::format_tsv(val, ...)
  })
}

register_serializer("tsv",  serializer_tsv)

meztez · September 30, 2021, 6:50pm

Seems like 7 seconds make sense. I've tested a few other packages and nothing was significantly faster so far.

result = data.frame(replicate(400, runif(35000, min=0, max=100)))
system.time(a <- readr::format_tsv(result))

matmu · October 1, 2021, 7:00am

Thank you @meztez. At least now it is clear what the main issue is. However it seems that plumber produces some overhead and I am not sure were it is coming from. On my server readr::format_tsv takes around 4.3 seconds which doesn't add up to the 7 seconds for the above example. I have also seen that for other examples.

system · October 8, 2021, 7:00am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.