Pak and renv take a lot of time to restore environment

Hi everybody,

I am trying to build a Docker image that restores the dependencies described in the renv.lock and requirements.txt files (The code uses mlflow and thus needs to install the mlflow python package). Problem is, renv takes so long to restore the packages, as it has to download and compile everything.

I have tried to set RENV_CONFIG_PAK_ENABLED=TRUE before I run the restore, but then, when I first enter an R command, renv finds the .Rprofile file and tries to bootstrap everything. It downloads the renv package without problems, but then tries to download and install the pak package and it freezes a long time.

I have also tried to run a container using the image, and then execute a pak::meta_update(), noticing that it also takes a lot of time.

Running from the host machine, outside a virtual environment, also takes a lot of time when updating the metadata.

Is this the expected behavior?

This is a not-reproducible example of the Dockerfile (as I am using an in-house script for downloading the source code from our nexus repository, including the renv requirements):

FROM rocker/r-ver:4.3.1

RUN apt update
RUN apt install -y libgit2-dev python3-full python3-pip libcurl4-openssl-dev
RUN python3 -m pip install dbt-clickhouse dp-cicd==0.2.0 -i http://nexus.int.sys.idealista/repository/pypi-public/simple --trusted-host nexus.int.sys.idealista
RUN download_from_nexus --version "==1.0.10" --base-artifact-name churn2-v --group pipelines/churn2 --extract-folder /opt/churn2
WORKDIR /opt/churn2/rchurn2
ENV RENV_CONFIG_SANDBOX_ENABLED=FALSE
ENV RENV_CONFIG_PAK_ENABLED=TRUE
#RUN R -e renv::use_python\(\"/usr/bin/python3\"\)
RUN R -e renv::restore\(\)
RUN R CMD INSTALL .

If I cannot make this work, I guess that I'll be resorting to use a mounted package cache for renv.

Any hint or help would be much appreciated.

Regards,
Gus.

I have timed the docker build call, and it takes around 20 minutes. But 10 of them are devoted to install the pak package. :-S

The only way to avoid time related to downloading and compiling would be to include the downloaded and compiled packages into a folder, so that they are always present. Renv would be just used for your record keeping when you keep track of what to download/compile and bundle.

You use .libPaths() to tell R to look for libraries in a folder you provide.

Do you actually need to install pak separately? I thought you had to add it to the renv lock file. But maybe I am wrong.

In any case, I can't think of any reason why installing pak should take that long, even if you compile it from source. If you are already installing it yourself, I suggest you install it from a pre-built binary: All about installing pak. — Installing pak • pak

I also don't really know why meta_update() would take so long, what are the repositories you are using? I.e. what is the output of this?

getOptions("repos")

Or, what are the repositories configured in the renv.lock file?

These are my current repos...

image

By showing you this, I have noticed that maybe I could gain some time by configuring my local renv to use the posit package repository... Am I right?

Do you have other repos configured in your renv.lock file? Can you actually share that file?

Sure! Here it is: renv.lock · GitHub

Meanwhile, I have tried to start over in my local development environment, but it seems I cannot do it properly. Is it enough for starting up again to delete the renv.lock, requirements.txt files and the renv folder? Or, do we need to delete something else?

OK, that is a lot of packages indeed, so it is reasonable that compiling them from source would take several minutes. I would indeed suggest to use binary packages from Posit Package Manager.

Installing pak still should not take 10 minutes, so I suspect that renv is also doing some other things there. If you switch to PPM anyway, then this might be solved as well, if you install pak from PPM. Otherwise you can install our binary pak build, see above.

Lastly, meta_update() also should not take more than a handful of seconds, but that's probably a separate issue because it is also happening outside of the container.

Are you behind a proxy? Can you try to measure how long these take, either on the host or in the container?

system.time(
  download.file("https://cran.r-pkg.org/metadata/src/contrib/METADATA2.gz", tempfile())
)
system.time(
  download.file("https://cloud.r-project.org/src/contrib/PACKAGES.gz", tempfile())
)
1 Like

Hi,

Sorry for the late reply. There are a lot of dependencies indeed. BUT I have some good news. I have followed your advice and switched to PPM. Now the complete process takes a few minutes and is completely suitable for my use case. It seems I had a renv.lock that was pointing to CRAN and compiling everything from source.

Pak installs fast in this new scenario and meta_update() works smoothly. So I guess all of this had to do with the environment I was using.

Besides, I have referenced a fixed date snapshot of the PPM from my renv.lock, thus giving me more reproducibility for the same price. :slight_smile:

Thanks a lot for your help!!! :slight_smile:

Regards,
Gus.

1 Like