I have a script which a docker image runs. This script calculates some values then pushes those values to an S3 bucket. Everything has been tested and is running well.
The problem is that this script is slow. I would like to run this script on each core of a server with 32 cores. I have never done this, so maybe it is naive of me to think this is possible.
There is a similar thread on this topic on stackoverflow.
One solution given is writing running Rscript test_learn_script.R
along with the bash commands nohup
(a POSIX command to ignore the HUP (hangup) ) and &
(a command to drop some code into the background). Using these commands a bash loop can be written as follows:
#!/bin/bash
# ---------------------------------------------------------------------------------
# Name: rscript_loop.sh
# Description: Runs a rscript loop in the backround on each iteration of the loop.
# The goal is to parallelize the script. Script in R defult to one
# core. This loop should be able to extend to the number of cores on a
# server.
#
# A solution provide here:
# https://stackoverflow.com/questions/31137842/run-multiple-r-scripts-simultaneously
# ---------------------------------------------------------------------------------
for i in `seq 1 3`;
do
Rscript test_learn_script.R $i &
done
I made a simple test_learn_script.R
file for this question which looks like the following.
library(aws.s3)
test_env=Sys.getenv(c("R_HOME"))
AWS_ACCESS_KEY_ID=Sys.getenv("AWS_ACCESS_KEY_ID");
AWS_SECRET_ACCESS_KEY=Sys.getenv("AWS_SECRET_ACCESS_KEY")
AWS_SECRET_ACCESS_KEY=Sys.getenv("AWS_DEFAULT_REGION")
classification_df <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
s3write_using(classification_df, FUN = write.csv,
bucket = "www.tsdata",
object = unique_name)
My test_learn_script.R
file would run fine with the following if I was not having it iterate in a bash script.
docker run -e AWS_ACCESS_KEY_ID='***' -e AWS_SECRET_ACCESS_KEY='***' -e AWS_DEFAULT_REGION='***' my_docker_project
How can I parallelize my R code, which requires AWS credentials, to run on all 32 cores of a server as a docker image?
Also my Dockerfile is below:
FROM rocker/tidyverse:3.5.0
#
## install packages Ubuntu goodies
RUN apt-get update
#
##install R packages
RUN Rscript -e 'install.packages("forecast")'
RUN Rscript -e 'install.packages("devools")'
RUN Rscript -e 'install.packages("furrr")'
RUN Rscript -e 'install.packages("lubridate")'
RUN Rscript -e 'install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))'
RUN Rscript -e 'devtools::install_github("tidyverse/ggplot2")'
RUN Rscript -e 'devtools::install_github("robjhyndman/tsfeatures")'
RUN Rscript -e 'devtools::install_github("ykang/tsgeneration")'
RUN Rscript -e 'devtools::install_github("alexhallam/tsMetaLearnWrap")'
# Add files in local machine directory
ADD . /usr/local/src/
WORKDIR /usr/local/src/
CMD ["./rscript_loop.sh"]