Minimal R installation

mmuurr · January 3, 2021, 9:30pm

I've been using AWS Lambdas more regularly and am interested in experimenting with building a custom R runtime for AWS Lambda. For long-running functions, simply using the basic 'shell' runtime and calling R via the runtime's bootstrap sh file works fine, but the overhead of starting the R interpreter for each call makes this unusable for short/fast requests where low latency is important. Another solution that some of come up with is calling R from a Lambda Python runtime, which is a bit clunky, and runs into some size issues (described below).

After doing a lot of research into the Python runtimes, it appears most of the actual Runtime API is implemented via C++ and simply wrapped by Python. This C++ base makes it a ripe ripe candidate for wrapping with Rcpp and trying to create a performant native R AWS Lambda runtime.

One issue I'm running into while exploring this option, however, is the size (on disk) of an R installation. The r-core package on EPEL, for example, installs 1.2 GB of files. A large chunk of this is texlive, perl, the gcc toolkit ... all useful for developing with R, but I'm less sure of the use of all of these during headless execution of R programs/functions. (Headless execution is an important distinction here, since help/documentation pages will not be needed.)

TeX Live (or perhaps MiKTeX) as TeX installations might be needed for plotting, I think. And maybe that's another use of Perl (i.e. as a TeX dependency)?

If plotting was explicitly disabled, e.g., what's the minimal set of files required for a base R executable installation? What if plotting is permitted?

I'm excluding system libraries that are needed by any non-standard library (i.e. any package not in options()$defaultPackages), as those libraries can (and should) be included as "layers" in the AWS Lambda world in order to keep images as small as possible.

I can't find much documentation in the way of a minimal R installation, and none of the common package managers (yum, apt-get, brew) seem to have such a concept available in their file bundles. Has anyone else here ever run into this sort of situation, where the goal is to provide R runtime capability but with a minimal installation (and thus permitting some R features to be broken, e.g. help pages)?

(BTW, I think the idea of a performant R runtime for AWS Lambda is quite exciting, and I'd be more than happy to share the runtime with others as a FOSS project.)

system · January 24, 2021, 9:30pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.