Simple online hosting solution for single data frame for RShiny app?

I am developing an RShiny application with a goal to serve ca 1000 daily users. I do have a data frame that is too big to be included in the application itself - 33M observations and 3 variables (chr, Date, num) I am looking for a solution where to host this data frame. The data frame can be publicly accessed and the users do not need edit the data in any way. The data frame includes about 10 000 unique characters strings which the application then uses to subset it by filtering out 10 to a new data frame. No complex queries needed.

What could be the most compatible solution?

EDIT: 33M not 3M observations*

I'm going to go lawyer on you :wink:

Do you have evidence or further argument for this claim ?
I think if the string values, were stored as factors, and you use a file format with high compression, then you can store such data very efficiently and bundle with your app.

Worked example :

library(tidyverse)

bigframe_1 <- expand.grid(a=letters,b=LETTERS,c=letters) |> head(n=10000) |> rowwise() |> mutate(
  across(everything(),\(x)paste0(rep(x,10),collapse=""))
,
  txt=paste0(a,b,c,collapse=";")
) |> ungroup() |> select(-a,-b,-c) |> expand_grid(Date=seq.Date(from=as.Date("2020-01-01"),
                                                                to = as.Date("2020-01-30"),by="1 day"),
                                                  num=1:11)

bigframe_2 <- bigframe_1 |> mutate(txt=factor(txt))

pryr::object_size(bigframe_1)
pryr::object_size(bigframe_2)

library(fst)

write_fst(bigframe_1,"bf1.fst",compress = 100) # 6.3Mb
write_fst(bigframe_2,"bf2.fst",compress = 100) # .5Mb

library(qs)
qs::qsave(bigframe_1,file = "bf1.qs") # .2Mb
qs::qsave(bigframe_2,file = "bf2.qs") # .1Mb

100Kb using qs on factor txt seems like an easy thing to budget.

In my case using qs::qsave() does not result any compression and I know why:
Third column is "numeric" and not "integer".

How I overcame it : as.integer(x*1000)

Now that we have a compressed file and we bundle it with the app it indeed does make the application smaller, but once we load the file to data frame it uses the same amount of RAM as originally. Is there a way to subset the .qs file while reading?

Maybe look at arrow/parquet.

lighting fast!!... the PARQUET file is 203,391 KB though.

Still looking for database recommendations.

You might check out DuckDB, it has a lot of features built-in for accessing parts of a dataset rather than having all of it directly in memory:

Even something as simple as SQLite would probably work well. The SQLite file to store the data will take whatever space it requires, but querying that file from R will only use the data needed for the query, not the entire file.

Best,
Randy

Given your requirements, a cloud-based database solution like Amazon RDS (Relational Database Service) or Google Cloud SQL could be a suitable option. You can store your large data frame in the database and access it through your RShiny application. Since you don't need complex queries and the data frame is publicly accessible, you can set appropriate permissions to allow read-only access to the database. This approach offers scalability, reliability, and the ability to handle a large number of daily users accessing the data concurrently. Additionally, you can consider optimizing the database performance by indexing the columns you frequently filter on to enhance query performance.

Overall that dataset doesn't seem big enough to require an external database. I would probably store it as a parquet file or arrow dataset and then use arrow to do the queries from Shiny.

The main reason to use an external data store is if you want to update the data without redeploying the app.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.