I've been using AWS Lambdas more regularly and am interested in experimenting with building a custom R runtime for AWS Lambda. For long-running functions, simply using the basic 'shell' runtime and calling R via the runtime's bootstrap sh
file works fine, but the overhead of starting the R interpreter for each call makes this unusable for short/fast requests where low latency is important. Another solution that some of come up with is calling R from a Lambda Python runtime, which is a bit clunky, and runs into some size issues (described below).
After doing a lot of research into the Python runtimes, it appears most of the actual Runtime API is implemented via C++ and simply wrapped by Python. This C++ base makes it a ripe ripe candidate for wrapping with Rcpp and trying to create a performant native R AWS Lambda runtime.
One issue I'm running into while exploring this option, however, is the size (on disk) of an R installation. The r-core
package on EPEL, for example, installs 1.2 GB of files. A large chunk of this is texlive
, perl
, the gcc
toolkit ... all useful for developing with R, but I'm less sure of the use of all of these during headless execution of R programs/functions. (Headless execution is an important distinction here, since help/documentation pages will not be needed.)
TeX Live (or perhaps MiKTeX) as TeX installations might be needed for plotting, I think. And maybe that's another use of Perl (i.e. as a TeX dependency)?
If plotting was explicitly disabled, e.g., what's the minimal set of files required for a base R executable installation? What if plotting is permitted?
I'm excluding system libraries that are needed by any non-standard library (i.e. any package not in options()$defaultPackages
), as those libraries can (and should) be included as "layers" in the AWS Lambda world in order to keep images as small as possible.
I can't find much documentation in the way of a minimal R installation, and none of the common package managers (yum, apt-get, brew) seem to have such a concept available in their file bundles. Has anyone else here ever run into this sort of situation, where the goal is to provide R runtime capability but with a minimal installation (and thus permitting some R features to be broken, e.g. help pages)?
(BTW, I think the idea of a performant R runtime for AWS Lambda is quite exciting, and I'd be more than happy to share the runtime with others as a FOSS project.)