State-of-the-art for reproducible R projects

joshualeond · July 11, 2024, 4:13am

Hi there, I'm a huge fan of the renv package and am trying to take it a step further than simply renv + version control.

I read that rig now exists to manage R installations and I also recently saw that pak may be the current recommendation for installing R packages. There's also an option to use pak from renv which I've yet to test: renv.config.pak.enabled

I'm hoping someone can chime in and tell me the best way to combine these three solutions to have reproducible R projects. At the minimum, it would be great to at least combine rig with renv in some manner but that's not clear to me.

Thank you!

Gabor · July 11, 2024, 10:35am

Using pak when installing an renv lock file does not really change anything, apart from potentially making the process faster thanks to concurrency.

What is your goal? To recreate an environment from an renv lock file? I can imagine that you can parse the R version out from the lock file, then install it with rig, and then install the R packages using that R version with renv/pak.

Do you have something like this in mind?

FWIW rig already has

rig rstudio <renv-lock-file>

which starts RStudio with the correct R version, as in the lock file. AFAIR it should also open the correct RStudio / renv project.

joshualeond · July 11, 2024, 1:22pm

Thank you @Gabor!

My goal is just to have the most reproducible env possible. I saw there's another package manager named pixi where you can install your Python/R interpreter version (or other things like quarto/cmdstan/etc.) from conda and all of that is captured in a lock file. But R in the conda world just doesn't appear to be a first-class citizen like Python.

Ideally I just want someone to be able to download my repo and reproduce my analysis without much knowledge of R.

I think combining rig with the renv lock file sounds really nice. I know that Positron is new but are there plans for something like the following?

rig positron <renv-lock-file>

Gabor · July 11, 2024, 1:45pm

IMO the fundamental issue with reproducibility is not R and R packages, but rather the system software: the OS, compilers, system packages.

I don't think it is realistic to assume that your code will run without changes on future OSes in 20 years, except maybe for the most trivial projects.

OTOH, one way to reproduce the system itself is to use a virtual machine or a container. This gives you maximum reproducibility, but you'll probably have to pay its price with a diminished developer experience.

joshualeond · July 11, 2024, 1:49pm

Yeah I feel like containers are likely overkill for my use case. I think combining rig and renv and maybe giving instructions to others on how to use them together will be sufficient. Thanks again.

system · July 18, 2024, 1:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.