Some thoughts that I shared also on the github issue you opened.
How R works ?
Base R assumes you want to install a package from CRAN. Thus, it implements all the rule for this specific repo, but leave some customization possible for other repo.
To install a package in R with install.packages
, everything relies on available.packages
that creates a db for install.packages
to look for. The db is used to build the download url base on package name, package version (the last one), and type of package. (source or binary). In fact, some filters are applied to get only those packages (see ?available.packages
)
available.packages
creates the db by parsing the PACKAGES
files, generated by write_PACKAGES
. write_PACKAGES
parses DESCRIPTION of each packages and generates the three files PACKAGES.rds
, PACKAGES.gz
, PACKAGES
. Only one of them is needed for available.packages
to work.
There are two fields that could impact the behavior of install.packages
:
-
Path
: available.packages
modifies the repo url if a PATH field is present.
-
File
: utils::download.packages
(used by install.package
to build the url) assumes by default that filename is of form <pkg_names>_<pkg_vers>.<ext>
. The File field allows to use custom filename.
About old version support, install.packages
does not provide support for old package version. You need to download the tar.gz
file of the old version manually and install with this local file using install.packages("pkg_file.tar.gz", repos = NULL)
. It means you don't need to provide a archive.rds
for installing old package. You need nothing really, but it helps to have a database to look for the url.
In fact, you can provide package name and version directly, build the url and try to download it.
Idevtools::install_version
and remotes::install_version
just parse the archive.rds
to check before downloading that the package exists, based on a url built by default as <repo>/src/contrib/Archive/<package.path>
. On the other hand, Packrat
just build the url, and try to download. it through an error if not successful. or install the package otherwise.
So, if you know the organization of the package in the repo, and also the filename convention, it is easy to provide a wrapper. (see below)
In every case, the challenge is the dependency chain. Basically when installing from specific version, it is better to install manually all the dependencies because I think they are not resolved correctly otherwise. It is what packrat
do using a packrat.lock
file. install_version
gets the last version of dependencies in both . This is not always wanted.
How nexus currently works and what are the impact ?
Currently, NEXUS advices to store each version in the same repository, at the root of src/contrib
. It is fine to do that.
Let's note that one can publish a package in a subdir of /src/contrib
. There is no error message. However, when it's done, the package seems not be listed in the PACKAGES.gz file, so can't be installed. Also, I am not sure how it is handle when trying to push the same file but in another path. Thins are not going so well. (Be the is another issue).
Let's say everything is on the root of /src/contrib
With this organization, you can install an old package using
install_packages_version <- function(pkgs, version = NULL, repos, ...) {
# Build the package name
pkg_name <- paste0(pkgs, "_", version, ".tar.gz")
# build the url knowing it should be in root /src/contrib
url <- paste(repos, "src/contrib", pkg_name, sep = "/")
# try to download
try <- tryCatch({
path <- file.path(tempdir(), pkg_name)
suppressWarnings(download.file(url, path, mode = "wb"))},
# catch the error
error = function(e) 1L
)
# if error, it means specific version is not available
if (try == 1L) stop("\nError: ", pkgs, " not available in version ", version, call. = FALSE)
on.exit(unlink(path))
# if no error, install the package using tar.gz so repos = NULL. (no dependency resolution)
install.packages(path, repos = NULL, ...)
}
If you try this function, it will work as expected for installing an old package without any need of PACKAGE files or archive.rds. (this function is inspired by packrat
behavior)
If we don't want to tryCatch
error, we need to create a way for R to know if a package is in the repo or not. So, this could be achieve by listing all packages version in the PACKAGES.gz
file. That way, install.packages
will have all the information and will still get the last one available, because "duplicates"
filters is set by default. With all the info in Packages.gz
, it is then easy to create a custom function to get a specific version, just by filtering correctly from the info of PACKAGES.gz. However, the PACKAGES.gz file will increase in size!
As complement, for hosted repository, the File
field could also be added to take into account someone who does not publish a file of the form <pkg_names>_<pkg_vers>.<ext>
. It would work no matter the name then. Without the field, not working.
The Path
field would be required if it is ok for NEXUS r plugin to deal with subdirectory in /src/contrib
.
About devtools
or remotes
This two are often use to install a specific version with install_version
. Currently, this function uses Meta/archive.rds
file but it is pretty easy to add support for Packages.gz
.
Also, a nexus
could be worth developing for use with the plugin. It could offer an install.packages
version that works correctly. Also, with this kind of solution, we could leverage NEXUS API to get the database of what is available and deal with this information to get the url of what to install.
In any case, dependency resolution is not done automatically. But this is another issue: which package was available when another was published.
What can be done ?
Basically, the plugin could reproduce the write_PACKAGES(".", lastestOnly = FALSE, addFiles = TRUE, subdirs = TRUE)
. It parses the DESCRIPTION
file to get all the information and write them in the dcf format. I think this could be done without needing R, and it could stay Java only.
It could also stay as it is, and deal with specificity on the R side by custom function.I think there is everything to make it work as is with custom functions.
I hope this investigation could help adding features and improve the plugin.
I moved this topic in the R-admin category, under package management, it is new and a better place