I currently supporting a small team of university biologist using R.
The hope is to improve their analysis workflow to make it reproducible and insure their key packages work consistently. Currently, they are working on their own machines, installing R-Studio Desktop, installing packages as needed, files are dispersed across their OS ... it's a mess.
They have a major dependency on Seurat
with various scripts they've accumulated requiring different version of Seurat
withe additional dependencies on various straight R packages (e.g. 'tidyverse'), recticulate
and python
.
My goal is to move them to R-Studio running in docker
for reproducibility. I recently have been reading up on renv
as a means to insure package consistency when using rocker
docker images.
The documentation on renv with docker is a great start but frankly there are some gaps to turning that into a usable workflow for my team. My goal is to make this as fool-proof and simple for the team. They are biologists, not IT experts. Most are only passingly familiar with the command-line.
My hope to create a workflow where
- Users can load a docker container with R-Studio and a usable version of
Seurat
that they require as a basis for their work. - Allow them to add new packages as needed
- Create an R-Studio project for their specific analysis
- Create a new
renv.lock
andDockerfile
that goes with that project which when called later allows their analysis to run without a hitch.
To date I have the following:
A docker-compose .yml
to make loading as painless as possible:
version: '3'
services:
r_seurat:
image: "aforsythe/r_seurat:dev"
build:
context: .
dockerfile: Dockerfile
volumes:
- "~/r_data/:/home/rstudio/"
- "~/.renv_docker/cache:/.renv/cache"
ports:
- "8787:8787"
environment:
- 'DISABLE_AUTH=true'
restart: always
A Dockerfile :
FROM rocker/verse:4.0.4
ENV PATH=/root/miniconda3/bin:${PATH}
ENV RENV_PATHS_CACHE=/renv/cache
ENV RETICULATE_PYTHON=/root/miniconda3/bin/python
ENV RENV_VERSION=0.13.1
RUN wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash Miniconda3-latest-Linux-x86_64.sh -b && \
rm Miniconda3-latest-Linux-x86_64.sh && \
conda update -y conda && \
conda list && \
conda install -y numpy \
matplotlib \
pandas
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "remotes::install_github('rstudio/renv@${RENV_VERSION}')"
COPY ./renv.lock /renv/tmp/renv.lock
WORKDIR /renv/tmp
RUN R -e "renv::restore()"
WORKDIR /home/rstudio
And an revn.lock
file .. (in gist due to space constraints and not linked due to link limitations in forum)
https://gist.github.com/aforsythe/71fd5981d3d50066605b585fdc021b74
My question is, how can I use / modify what I have so far to accomplish the workflow I've outlined above.
I imagine with workflow going something like this :
- user clones a repo with the
Dockerfile
,docker-compose.yml
, andrenv.lock
- user runs
docker-compose up -d
- user visits
https://localhost:8787
in browser - user runs a small script (yet to be developed) to generate a template of directories and subdirectories (e.g.
analysis_code
,data
,data_clean
,figures
, etc.) which would be created in a project named subdirectory of/home/rstudio/
- user creates code, loading (eg.
library(ggplot2)
) packages as necessary with option to install new packages as needed - user "saves" project by running
renv::init()
and a script to create a docker file such than when they revisit the project they load a container based on that project specificDockerfile
andrenv.lock
Perhaps I'm off base in my expectations. I'm just looking for the easiest and most straight forward way to create a workflow for people who's job isn't managing their R worlds. They just need tools that work and work with minimal interaction.
Any help would be greatly appreciated.
Hoping @kevinushey my have some insight.