arm64 binary packages for Linux in Posit Public Package Manager

Thanks for providing the Posit Public Package Manager to the R community! It's an amazing public resource.

Does Posit plan on adding support for arm64 binary R packages for Linux in the Posit Public Package Manager, and any ETA on when that may happen?

1 Like

Hi @Lmendy

Thanks for the feedback! Glad you're finding Public Package Manager (and our Linux binaries) useful!

We do have arm64 binaries on our list of requested binary distributions, but it's not currently on our short term roadmap. Since we provide comprehensive binary support for the entirety of CRAN, not just on the latest version of R but also for the past four R releases, every distribution / platform variant we support adds over 100,000 binaries that have to be built and maintained. While we'd love to provide binaries for every requested environment, we do need to prioritize our limited resources where most needed.

Requests like this are appreciated and encouraged, as it does help us recognize the demand and prioritize accordingly. In that vein, can you give us a little more detail about how you use these binaries, and what arm64 Linux distribution(s) are most important to you?

Cheers,
Joe

P.S. If you haven't seen it, we recently published a post on our blog giving a little more background of our binary build process. The Road to Building Ten Million Binaries - Posit

2 Likes

Hi!

I was searching around to see if there was a plan for linux/arm64 binaries, and found your post above. I thought I would chime in here and answer your question above with my own use case:

I work between an Apple Silicon Mac (arm64), and an Intel-powered remote cluster. I would like to be able to develop multi-platform Docker containers containing a synchronised development environment, so I can test my pipelines locally before running on the remote cluster. Currently, creating the arm64 container requires compiling packages from source - this is slow, and times out if I run the build on e.g. Github Actions.

I would say that binaries for Ubuntu would be most important to me, as this would allow me to continue to use Rocker base images, but I don't know if that would be the most useful more generally.

I certainly appreciate the resources may not be there for this, but I hope that gives you some sense of at least one use-case, if it's still on your roadmap!

1 Like

At R-hub we have an experimental repo [1] that has aarch64 Linux R packages for Ubuntu 22.04 and R 4.4. Only a subset of popular and slow-to-build packages are available and they are updated daily.

The caveat is that you need to use pak [2] to install the R packages.

[1] GitHub - r-hub/repos: Custom R package repositories — work in progress!
[2] All about installing pak. — Installing pak • pak

1 Like

Thanks for the info everyone! The use case we have in mind is using renv to install binary package from Posit Package Manager in our arm64 Docker builds using Ubuntu based images.

1 Like

Thanks @cenococcum and @Lmendy for the additional info and use cases.

We are still exploring including full CRAN binaries for arm64 on selected platforms via Posit Package Manager. Hearing both of you interested in Ubuntu is a helpful data point. No timeline for availability yet, but still hoping to get it on the roadmap soon. Thanks as always for your interest!

1 Like

Just wanted to confirm the need for this. Anyone using an Apple Silicon Mac (an increasingly large number of people) has to deal with epically slow builds from source.

Particularly when used with Dockerized dev environment (rocker project), where it is common to reinstall all packages on every build, things become unusably slow.

Hi jpvelez, just so it's clear we do have arm64 binaries for Mac. If you use p3m.dev on your Mac, you should see a binary in most cases. It is the case though that we still haven't added in Linux binaries, which would cause the issues with a Docker build using Linux on the Mac.

FWIW the R4Pi project has aarch64 binaries for a subset of the most popular R packages for Ubuntu 24.04 LTS you just need to add "https://pkgs.r4pi.org/noble" as your package repo

Since it sounds like it's useful to hear interest in this, I'll just chime in that we have the same use case of containerized applications running on arm. In our case we use a mix of Ubuntu and Debian.

While I agree there is a real need to have precompiled arm64 linux binary packages....

For those that are struggling with epically slow build times for R packages now:

I would echo what @Gabor already hinted at and highly recommend to use https://pak.r-lib.org/ to install packages. The relevant feature in pak is that (source) packages are downloaded, compiled and installed truly in parallel leading to significant speed up when compared to install.packages(). So the more cores you have available on a given server, the faster the docker build will be.

Additionally pak - when run with admin privileges - will automatically install any system dependency needed for the desired list of R packages.

I have been working with someone lately and they were heavily using Bioconductor packages that only exist as source packages (irrespectively whether it is x86_64 or arm64). Due to the heavy use of C++ the compile times are extremely looooong. Using pak instead of install.packages() however reduced their docker build time from a day to 1.5 hours.

1 Like

Are there any other upcoming features or enhancements planned for the Posit Public Package Manager that the community should be aware of?

Thanks all for your comments here. We do have arm64 binary R packages for Linux on our roadmap, but I don't think we're ready yet to give a timeline on this. We are listening to the demand though and do see this as something we need to do here at some point.

As for other upcoming features/enhancements, we are updating p3m.dev every time we have a package manager release, so there's lots of new things coming every few months. While many of the things listed in Posit Package Manager aren't for public package manager, you will see new things showing up there, such as the date-based Bioconductor snapshots that we released recently. In addition, behind-the-scenes, we did make significant upgrades to our infrastructure-as-code for public package manager in the last year to ensure we can keep it running well.

reduced their docker build time from a day to 1.5 hours.

a day!!! I will keep that in mind the next time I feel that my builds are slow :sweat_smile:

I did try the pak suggestion (via renv.config.pak.enabled) and on our CI setup it took the build down from ~45 minutes to ~30. However, image size is a consideration for us as well, and the final image with pak was >2x as large. Any idea why this would be the case? Cleaning up pak's cache didn't make a difference.

Happy to take this question to a different thread if that would be better.

It is getting a bit off-topic indeed. :slight_smile: Nevertheless, the image should not be bigger if it has the same packages. Two issue I can think of., and a third that @michaelmayer told me about.

First is, pak installs system dependencies automatically, but it might install some system packages that are not needed, especially if you are using binaries from PPM. That's because if installs the build-time dependencies as well, but if you are installing binaries, then only the run-time dependencies are needed. If you are already taking care of system dependencies, then set

ENV PKG_SYSREQS=false

(Cf. [1]).

Second is that pak caches metadata and package downloads. Unfortunately there is no way currently to turn off the cache globally, but you can delete it in the same Docker step that you are using pak from, with

RUN ... && R -q -e 'pak::cache_clean(); pak::meta_clean(TRUE)'

Third is that pak leaves behind some files in /tmp, you can delete them with rm -rf /tmp/* in the same Docker step.

Here is an example. Using ghcr.io/r-lib/rig/ubuntu:latest, because it already has pak, I have three Dockerfiles, the first:

FROM --platform=linux/amd64 ghcr.io/r-lib/rig/ubuntu:latest

RUN R -q -e 'pak::pkg_install("tidyverse")'

The second:

FROM --platform=linux/amd64 ghcr.io/r-lib/rig/ubuntu:latest

RUN R -q -e 'pak::pkg_install("tidyverse"); pak::cache_clean(); pak::meta_clean(TRUE)' && \
    apt-get clean

(You don't need the apt-get clean if you turn off the sysreqs.)

Third:

FROM --platform=linux/amd64 ghcr.io/r-lib/rig/ubuntu:latest

RUN R -q -e 'pak::pkg_install("tidyverse"); pak::cache_clean(); pak::meta_clean(TRUE)' && \
    apt-get clean && \
    rm -rf /tmp/*

I get these (compressed) image sizes:

EDIT: @michaelmayer is telling me that pak also leaves files in /tmp, so I added the third Dockerfile.

[1] Environment variables and options that modify the default behavior — pak configuration • pak

Thanks Gabor! This was a big help.

I do need the build-time dependencies precisely because these are arm builds where no PPM binaries are available. I had already taken care of those in my dockerfile, so PKG_SYSREQS=false didn't have an effect on the final size.

I had tried pak::cache_clean(); pak::meta_clean(TRUE) but neglected the fact that of course this needs to be run at the same step as the installation. Putting it in the right spot reduced the compressed image size by ~400 MB. Clearing out /tmp brought it down the rest of the way, to a final image that's slightly smaller than the original :tada:

Sorry for veering off topic, but hopefully this is useful for anyone else who is using pak to improve build times for arm-based images.

2 Likes

Glad this is sorted now!

While in your case the reduction from 45 to 30 minutes is not an overwhelmingly significant improvement, it would be interesting to see how much time of the build is actually spent in building those R packages. As only this part is really parallelizable via pak, you could be in a situation where you reduce the time for package build 10x but the impact on the overall build time still can be be negligible if most of the time is spent outside of package installs (cf. Amdahl's law - Wikipedia).
If on the other hand most of the time is spent in building packages, you will probably see better speed-ups by increasing the resources (cpu and memory) for your CI runners (allowing much more packages to be built in parallel).