Best system for sharing internal packages?

At my work, R is still treated as a "Wild West" thing. A dozen or so of us use it for analysis, and a few use it to create official reports. But the only infrastructure we have is IT pushing new versions of R/R Studio when we recommend it.

Right now, I'm the only one writing and maintaining a couple internal packages. The distribution method is copying the built package to a shared file server and sending email notices to everyone.

But I know somebody's figured out a better way. Is it maintaining up an internal repository with miniCRAN? Installing packages to a site library to make sure everyone's using the latest versions? Keeping it Wild West?

What's everyone else doing?

6 Likes

I have had some success with the DRAT package
https://cran.r-project.org/web/packages/drat/vignettes/WhyDrat.html

ROpenSci have a great repository where they push the package to an S3 bucket as part of continuous integration - https://github.com/ropensci/drat

(as an aside, I would highly recommend trying out RStudio Connect. It is great in a corporate environment)

2 Likes

In our company, we use an internal repository CRAN-like build with functions from miniCRAN, drat and packrat. Each one can help you with that. Our repository is only for internal packages and behind a proxy.
We also use an internal Gitlab to work on R packages and host code sources. It is also possible to install packages from there (in a similar way as install_github from devtools

If possible for you, you can also host with drat your repository online on github pages (or something like that).

Here is a similar discussion (oriented towards RStudio Connect)

@slopp announced further information on a project RStudio is working on. Will see !

Some useful links on this :

2 Likes

Do you have private github repos at work? I have found it really easy to just do my normal github flow when building and updating my internal rstats package, then install with devtools::install_github(). So far there is only one other person that is using R a little bit though, so maybe there is a wider distribution use case where this falls apart that I'm not thinking of.

4 Likes

I have a dedicated IT team, as well as Devops and Data Engineering, but I still administer almost all of our R servers myself. For one, it's more important to me than it is to those other teams. This means I have to be able to understand as much as necessary to satisfy IT requirements, but also means I have full control of installing packages and updates (and users). It's a big trade off in work, but in my opinion it's worth it.

Our internal packages are stored on a private github organization. On Rstudio Server instances, I maintain system wide package installation, but I allow users to install other packages to their home directory. Instead of running scheduled scripts via crontab, we can use Jenkins or Airflow.

I think those other teams I mentioned also think R is the "wild west". I have built a level of trust and responsibility that they are ok with me managing the "Wild West", though. :stuck_out_tongue_winking_eye:

2 Likes

Thanks for all the advice so far. It sounds like everyone's using their own repositories instead of shared libraries, so I'll trust your experience and follow the same path.

We have Microsoft's Visual Studio Team Services, but it's only used by IT classic. I'm currently convincing management to open it up to the statisticians by doing a pilot project. We'll be using Git projects, so devtools::install_git(...) should work. I'm worried about MS' obsession with licenses, but this seems promising.

The process of "enterprization" does seem to require this position, and, as you point out, it's best done by somebody who cares. I've been looking to shake up my career, so this wouldn't be so bad.

RStudio Connect would solve a lot of our problems beyond package sharing. Thanks!

1 Like

I completely agree with that and it is a role I play in my enterprise too.
There is a great Rviews blog post on this kind of position. I find it inspiring

2 Likes

We use an internal drat repository happily hosted internally. The packages itself are private github repos. The only help we needed from IT was a folder to which we have write access as well as access through http.

I'm currently trying to fix this problem as well, and am seeing mentions of hosting a cran-like repo on S3, but not seeing any detailed examples. should it be as simple as taking the advice from here, but then duplicating the contents to S3 after it's created, or is it more involved than that?

A slight wrinkle is that many examples are using private github, and our business uses bitbucket exclusively, and I'm not sure how that would effect the tooling etc.

Yeah, I think it is that straightforward.

Ropensci use a DRAT repository hosted on S3 that I found very useful to model our DRAT repository on.

1 Like

I'm just trying to understand how I would then get users to install it, as the contents are somewhat business sensitive, and the S3 bucket would be authenticated.

I suppose I could write a custom install function( install_s3() ?) which would in the first part download the contents of the S3 locally, and then in the second part build the package?

1 Like

If you use GitHub Enterprise, you might be interested in ghentr: a new package to help you share packages internally using your instance of GHE. https://ijlyttle.github.io/ghentr/

As you know, you can work with regular GitHub using:

usethis::use_github()

devtools::install_github("user/repo")

Let's say you work at Acme Corporation. If you want to do the same with your instance of GHE, you have to supply some custom arguments:

usethis::use_github(host = "https://github.acme.com/api/v3", auth_token = Sys.getenv(…))

devtools::install_github("user/repo", host = "github.acme.com/api/v3")

This can become tiresome; it might be useful to create a package, say acmetools, that has a couple of custom functions that you could use instead:

acmetools::use_github_acme()

acmetools::install_github_acme("user/repo")

The purpose of ghentr is to make it easy for you to create these functions.

It also has some functions to help you establish and maintain a CRAN-like repository on GHE, using drat. This would make it easier to integrate your private packages into packrat and RStudio Connect.

If you are going to rstudio::conf, I'll be talking about it there.

2 Likes