Reduce dplyr.so size

Hi All,

I'm running dplyr on AWS Lambda (via rpy2), so installing it on an amazon ec2 linux instance via install.packages and then packaging into something that works with AWS Lambda. Something similar to https://github.com/nafiux/portableR

The compiled dplyr.so is about 33MB however - which (along with all the other large R shared object and rdb files) starts to cause problems with uploading the package to AWS (though there are ways around this using S3 upload - but would like to avoid if possible). For reference (and i'm not trying to solve these here), libRblas.so is also about 33MB and stringi.so & its icudt55l.dat (which are used by lurbridate i believe via stringr) are ~50MB combined

Are there any configure.args options that can help to reduce the overall dplyr.so size? (eg presumably debug is already stripped, what about using dynamic linking instead of static? - or disabling uncommon functionality that am unlikely to use (plotting/pdf/image output))

thanks

roger

1 Like

I'm having similar issues. We're trying to use AWS Lambda to schedule ETL's from our cloud software into AWS S3. We have not been able to get the Lambda running yet because our build is too large to fit in the 250 MB of allocated space. We'd like a way to only install specific functions of each library since we only use a few, but we found another potential solution - http://dirk.eddelbuettel.com/blog/2017/08/20/#010_stripping_shared_libraries. I'll come back on this thread if and when we're able to solve our issue.