failed to retrieve package Seurat@4.3.0

My Dockerfile:

# get shiny server plus tidyverse packages image
FROM us-west1-docker.pkg.dev/my-gcp-project/shiny-verse/shiny-verse:4.3.1

# system libraries of general use
RUN apt-get update && apt-get install -y \
  git \
  curl \
  sudo \
  pandoc \
  pandoc-citeproc \
  libcurl4-gnutls-dev \
  libcairo2-dev \
  libxt-dev \
  libssl-dev \
  libssh2-1-dev \
  libpq-dev \
  libhdf5-dev \
  liblzma-dev \
  libbz2-dev \
  libglpk-dev \
  libfftw3-3 \
  libmpfr-dev \
  ## clean up
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/ \
  && rm -rf /tmp/downloaded_packages/ /tmp/*.rds
  
# Install renv
RUN R -e "install.packages('renv', repos='http://cran.rstudio.com/')"

# Initialize renv
RUN R -e "renv::init(bare = TRUE)"

RUN R -e "renv::install(c('Seurat@4.3.0'), prompt=FALSE, repos='http://cran.rstudio.com/')"

The error:

55.11 Warning: failed to find source for 'Seurat 4.3.0' in package repositories
55.11 Error: failed to retrieve package 'Seurat@4.3.0'
55.11 Traceback (most recent calls last):
55.11 7: renv::install(c("Seurat@4.3.0"), prompt = FALSE, repos = "http://cran.rstudio.com/")
55.11 6: retrieve(packages)
55.11 5: handler(package, renv_retrieve_impl(package))
55.11 4: renv_retrieve_impl(package)
55.11 3: renv_retrieve_repos(record)
55.11 2: stopf("failed to retrieve package '%s'", renv_record_format_remote(record))
55.11 1: stop(sprintf(fmt, ...), call. = call.)
55.11 Execution halted
------
Dockerfile:34
--------------------
  32 |     RUN R -e "renv::init(bare = TRUE)"
  33 |
  34 | >>> RUN R -e "renv::install(c('Seurat@4.3.0'), prompt=FALSE, repos='http://cran.rstudio.com/')"
  35 |
  36 |     # Install R packages required
--------------------
ERROR: failed to solve: process "/bin/sh -c R -e \"renv::install(c('Seurat@4.3.0'), prompt=FALSE, repos='http://cran.rstudio.com/')\"" did not complete successfully: exit code: 1

However, renv::install("Seurat@4.3.0", rebuild=TRUE, repos="http://cran.rstudio.com/") works without issues on my macbook. Seurat@4.3.0 should work in a standard linux os, right? Any ideas?

The general lack of specifying R package versions in R packages and tools (e.g., shiny apps) leads to a lot of reproducibility issues versus other languages (e.g., requirements.txt in python).

It's so odd that the standard package manager (install.packages) does not allow users to specific package version (ranges), and renv is very limited (can just specify a particular package versus and not a range).

Hi @nyoungblut

as an alternative you can use the posit public package manager and use a frozen repository like https://packagemanager.posit.co/cran/__linux__/noble/2022-11-22 (since this version of the package you want was released on the 18.11.2022) to have a specific frozen list of packages. Also they are pre-compiled for ubuntu (and oder linux flavors) shortening the installation process.

With the amount of packages and the fact that the version is 2 Years old and you want to use R 4.3.1 (which was released in june 2023) you will also need to think about the system libraries that you are updating before calling the package (Seurat was build back than with probably other system dependencies in mind).

So to make it truly reproducible (as in any language where you need system libraries and a bunch of transitive dependencies) it is a very hard problem to achive.

Thanks @vedoa for the information!

since this version of the package you want was released on the 18.11.2022
Similar to Python 2 vs 3, many bioinformaticians are still using Seurat v4 and have not updated to Seurat v5, given that substantial UI changes between the versions.

So to make it truly reproducible

...but why not have a straight-forward method of selecting versions when installing packages via the standard package manager, as with most other popular coding languages?

I should note that conda (and Rocker) are ways of creating more reprodicible R environments, but they are do not have mainstream support in R (e.g., not much support for Conda in RStudio, and a limited selection of R package on conda-forge/bioconda).

I love R for it's simplicity and power for data analysis, and especially plotting. However, R is frustrating to work with versus Python for some major aspects, such as i) pacakge version specification, ii) package structure (there is no standard way of creating a hierarchical pacakage structure -- all R files in a package must all be in one directory), and iii) no isolation of environments for Rmd/Quarto notebooks in Rstudio, unlike Jupyter kernels (this leads to "version creep" as users install more packages as their data analysis progresses).

Disclaimer: i can't represent R, the core team design decisions etc. in any way but as someone who really likes the language and has to always fight for it in my daily work here some comments

it is very hard

  • C/C++ has still no real package manager
  • the Golang creator admitted that he failed at this task 2 times - before introducing modules it was basically a big problem to adapt go properly
  • java's transitive dependency is a hell to configure
  • js packages can be deleted (once you decide which one to use) and break your build forcing you to update everything
  • zig only introduced a concept for package management with 0.11
  • rust did it well with cargo (but takes ages to compile for reasons that are beyond this post)

yeah - it sucks - it's just difficult and the way CRAN works i don't see an easy way out (renv is the 3 or 4 try to make a package to do that). As a package maintainer on CRAN you should actually make sure to not break any reverse dependencies when updating your package - but this is really difficult to maintain and enforce.

you can do it in the description file when releasing a package. Example - ggplot2/DESCRIPTION at main · tidyverse/ggplot2 · GitHub
so if you have lifecycle < 1.0.1 it will error while installing. To specify it while installing - i will give you that this is not easily possible without the devtools or remote packages.

as someone who had to write a shared jenkins library to make all R package builds possible this was a blessing. I know the structure - i know the behavior - no surprises. No 500 edge cases like in any other programming language because no real convention is established (they will claim it is - but its not). Here it is strictly given and i can check everything. We only need devtools and not a new nice project management system every few years, because the old one sucks (js and python a great examples of how to have a disagreement in the community leading to a lot of package design structures etc).

You can install R dependencies in any folder you like. So making 4 different folders containing different dependencies is possible. You than have .libPaths() with which you can just add the path that holds the dependencies that you want. R will also automatically detect if a .Rprofile is in the folder you are currently working. If not it has a fallback mechanism to look in the home path if there is a .Rprofile . There you can automatize the selection of the path. Yeah, you are right there is not virtual env way of doing things out of the box.

Currently Rocker + Public Posit Package Manager is the way i resolve the issue.

Thanks @vedoa for the great discussion! I too love R in many ways, but I am constantly torn between R and Python because of my percieved disadvantges of R (some listed in this thread), which do not seem to be substantially changing (e.g., the continued mainstream use of install.packages, which just installs the latest package version).

it is very hard

...but it is possible with various languages (e.g., Python and Rust), and it can be done with R, if one uses conda. I created a lot of the R package recipes on conda-forge, in hopes that conda would become more mainstream among R users, but that doesn't seem to be the case. conda is not really embraced by the R community (e.g., no real support in RStudio).

you can do it in the description file when releasing a package

It is helpful there, but I work with a lot of tools in which the developers do not set R versions. Many times this is because the tool is not really bundled into an R package, and many developers in bioinformatics do not want to jump though the hoops of publishing on CRAN, so their "R package" may be half-baked. In python, one just needs to provide a requirements.txt (or similar) with package versions set (admittedly many still do not set versions, but at least it is very easy to implement).

We only need devtools and not a new nice project management system every few years

A package hierarchy is quite straight-forward with Python (basically just adding __init__.py files to subdirectories). It doesn't have to be complicated.

You can install R dependencies in any folder you like. So making 4 different folders containing different dependencies is possible

While it can technically be done, how many people do this, or know that it is even possible? With Jupyter, one must always select a kernel from a particular environment, so isolation is always front-of-mind.

I've worked with many data scientists and bioinformaticians who have run into the issues of "version creep", where they install a new package to conduct a new analysis for their project, and the install updates many of the existing packages in their environment. The user then is unsure whether their existing code (scripts, Rmd, Quarto) will generate the same output as before, or maybe the code just althogether broke. The design of R and Rstudio makes this the default path: one inits their R project and then just keeps installing packages into the R project, with no within-project isolation.

The lack of useful stack traces (especially for Shiny app development) is also constant source of frustration with R. I'm currently trying to update an abandoned Shiny app codebase with many 1000's of lines. It is so hard to find the sources of the errors.

...and then there's the big limitations for debugger in Shiny apps. From the docs:

Unfortunately, breakpoints aren’t helpful in all situations. For technical reasons, breakpoints can only be used inside the shinyServer function. You can’t use them in code in other .R files

...so there's no easily splitting of the server code into multiple R files. This natually leads to very long server.R or app.R files so that one can use the debugger.