feasibility of deploying a departmental shiny server?

sdutky · October 1, 2023, 6:09pm

Thinking about taking on a project to build shiny servers for development, staging, and public access for apps created by students, faculty and staff. This would be hosted on linux virtual servers maintained by local IT or AWS.

Any advice appreciated!

mtbakerguy · October 7, 2023, 12:09am

For my personal use, I've setup Shiny Server on two different ARM64-based VMs (Oracle Linux and FreeBSD). Things to note:

The installation instructions for shiny server are unorthodox as they include an embedded node package. My advice: ignore them completely and install node yourself (I use version 18) as it doesn't tie you to X86 which is typically at least 25% more expensive in absolute cloud compute cost and much more expensive in price to performance.
You'll also want to pick a webserver (I use nginx) and setup an SSL certificate mechanism. Since you are at a university, there's probably already a (hopefully lightweight) mechanism for certificate issuance, renewal and revocation. If the process is really lightweight, you could have a certificate/service but expect it to be ■■■■ty and you'll use Server Name Indication (SNI) to route requests so you can amortize the bureaucratic annoyance over numerous users. I'd expect the security group at your school to have information on the certificate process as well as how to setup SNI.
Installing R, shiny and the base shiny apps are relatively easy after you've gotten nginx w/ssl proxying requests to the loopback (127.0.0.1:3636).

For my day job, I run production services:

how will users package up their application and refresh it? For my app, I have the equivalent of an S3-bucket that I poll occasionally for changed files, copy them down if necessary and restart.
you might have access to a kubernetes infrastructure that abstracts a bunch of the above away. If this is the case, you might be able to do nothing but say, here's your base container, entry point and build pipeline, commit code here.
where does their data go? Will they have structured storage (e.g. relational or nosql databases) or will you support files? In either case, how much space will they be allocated? For my usecase, I initially parsed CSV files on startup but that got annoying quickly as it made my iterations slow so I moved to a data load and save step so I could deploy the R image directly which reduced my startup time to essentially zero and used less disk space than a bunch of giant CSV files.
how will you ensure that stale applications get reaped? This will matter if you're using a public cloud as the compute and storage costs continue to accrue even if the application's no longer used. Unlike the previous ones, this probably isn't an MVP feature unless you have significant budget constraints.
how do you ensure you don't get overwhelmed by support requests? You'll need adequate logging and metrics to notify you on failures and help you restore service quickly when something breaks. Note, you could make an alternative choice and say it's best effort and you'll get to it when you can.

Since I don't know anything about your environment, I explicitly didn't touch compliance and security requirements (e.g. vulnerability scanning/patching).

If I was setting this up, I'd probably do the following:

talk to my org's public cloud people about access to EKS so users can deploy their app in a separate container.
create a base container with R and shiny that complies with my org's security policies and setup a way to easily refresh it so I can quickly fix vulnerabilities and EOL issues. I'm not sure if I'd let people bring their own container or if I'd force them to use mine but I lean towards being autocratic and force them to use mine. Since I've made refreshing the container trivial, I'll be a benevolent dictator as it's easy to add and update libraries.

TLDR; it's rewarding if you can avoid the quagmire of compliance and security but it's an enormous amount of work.

sdutky · October 18, 2023, 7:37pm

Thank you very much for your thorough response! You gave me a lot to digest.

Getting prematurely ahead of myself:
For user data, I understand that Google cloud service (and I assume GKE) provides high bandwidth access to files stored on Google Drive. Would it be possible to require owners to load/save their data through a shared link to their own files stored on Drive? This still requires putting a reasonable limit on temporary session data.

For abandoned apps, I suppose it would be possible to require owners to update their app registrations periodically.

I will probably first try seeing how well campus IT provisions departmental VM's and jump into to the fun part of building a stand-alone server prototype. Are you running Kubernetes on your personal servers?

system · December 11, 2023, 11:38pm

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.