quickly read tsv file from user supplied input on posit connect

Hello,

My shiny app will read in a user supplied counts matrix in tsv format and create a bar graph. On my local computer, uploading my file (15,000 rows x 65 columns, 14.2 MB) takes less than a second. However, after I publish to posit connect it takes 10-15 seconds. How to I make this quicker? Can I adjust the memory in posit connect?

library(shiny)
library(ggplot2)

shinyApp(
  
  ui = fluidPage(
    
    titlePanel("Visualize Counts Matrix"),
    
    sidebarLayout(
      
      sidebarPanel(
        radioButtons(
          inputId = "filetype", 
          label = "Select a file type",
          choices = c("tsv") # will add more file types later
        ),
        fileInput(
          inputId = "upload", 
          label = "Upload a counts matrix",
          accept = c(".tsv") # will add more file types later
        ),
        uiOutput(
          outputId = "geneOptions"
        )
      ),
      
      mainPanel( 
        plotOutput(outputId = "plot")
      )
      
    ) # end sidebarLayout
  ), # end of fluidPage
  
  server = function(input, output, session) {
    
    output$geneOptions <- renderUI({
      req(input$upload)
      selectizeInput(inputId = "goi",
                     label = "Select a gene",
                     choices = rownames(getData()))
    })
    
    getData <- reactive({

      req(input$filetype)
      req(input$upload)

      ext <- tools::file_ext(input$upload$datapath)
      df <- data.frame()
      
      if (ext == "tsv") {
        df <- read.table(file = input$upload$datapath, sep = "\t", header = TRUE, row.names = 1)
      }
      
      df
      
    })
    
    getGeneData <- reactive({
      req(input$upload)
      df <- getData()
      df <- reshape2::melt(df[input$goi,])
      colnames(df) <- c("sample","values")
      df
    })
    
    output$plot <- renderPlot({
      req(input$upload)
      req(input$goi)
      geneData <- getGeneData()
      
      p <- ggplot(data = geneData, mapping = aes(x = sample, y = values)) +
        geom_col() +
        theme_bw() +
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
        scale_x_discrete(limits = geneData$sample)
      p
    })
    
  } # end of server code
) # end shinyApp()

There are two steps. First the data needs to be uploaded to the server (i.e. the file itself needs to be transferred), second you need to load it in memory (i.e. have R read the content of the file).

The first step depends purely on the speed of the Internet connection and has nothing to do with R. One solution to speed it up is to recommend the user first compresses the tsv file (zip or gzip), thus uploading a smaller file.

The second step is about the function used to read data. read.table() is not the fastest available, from memory, data.table::fread() is the fastest, though nowadays readr::read_tsv() (which uses {vroom} under the hood) might be just as fast.

In any case, before you start changing everything you should first figure out which part is slow:

1/ try removing the code to read the file, and only signalling when it's successfully uploaded (if it's still slow no need to optimize the reading part)

2/ on the server, change the reading function with a bench::mark() that tries different approaches (and bench::mark() will automatically try to read the file several times), you can see if reading the file is reproducibly slow and if alternative approaches are faster (and then no need to bother about upload speed)

Thanks for this info! The slow step is uploading to the server - I will try zipping the file!

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.