Ideas/directions for the fastest possible way to load and subset data in a Shiny app

sascha · June 29, 2020, 9:28am

I'm here to get some ideas and directions to make a Shiny app more performant. Here is my setup:

This is the app I am talking about: https://owid.shinyapps.io/cOWIDplusViewer_BigramSearch_en/
One dataset with approx. 4 million rows and 5 columns (pryr::object_size gives 207 MB), column types are date, chr, int, num, logi
In some cases, I need regular expressions to search for entries in the str column and subset the dataset accordingly, currently I am using stringi::stri_detect_regex for this.

I am not satisfied with the current performance of the Shiny app and am wondering whether I could speed it up significantly. The main bottlenecks of performance are, as I can see:

Loading the data into memory (only during start-up)
Subsetting the data (everytime a new search is triggered)

Currently, my only idea is to use an external database which I could query from the Shiny app. This should get rid of the loading time at the beginning and should decrease memory usage of the app, right? But will this also speed up subsetting the data according to the parameters the users can set? I don't have any experience using databases (not with R, not with Shiny, not at all). So I cannot judge if this would make a considerable difference. So, I guess my questions are:

What is the optimal setup in a Shiny app where regex subsetting is required from a data source with 4 million rows?
Would an external database reduce loading time?
Would an external database reduce subsetting time?
If so: Which external database should I use as a complete beginner related to databases? (Of course, it should play nice, easy and fast with R and Shiny).

Thanks a lot for some pointers, your help is much appreciated!

system · July 20, 2020, 9:39am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.