Discovering archived packages

package
cran

#1

Our company uses Sonatype Nexus to publish our internal software artifacts, and Sonatype has an experimental R plugin that lets an R package repository function like a bona fide CRAN repository. In general it works pretty well.

One little mismatch: Nexus (like most other artifact repositories) lets multiple versions of R packages coexist in the repo, while CRAN has just the “most recent” (highest version) artifact listed in its PACKAGES / PACKAGES.gz files, and lists previous versions in an archive.rds file in the src/contrib/Meta/ folder.

Currently:

  • The Nexus R plugin doesn’t know how to generate an archive.rds file (it generates the PACKAGES index on the fly in response to HTTP requests);
  • The remotes::install_version() function (which I have a feature branch of here, BTW) only knows how to discover older versions through the archive.rds file.

So they don’t play well together on this issue.

I’ve raised a ticket on the plugin’s GitHub queue:

https://github.com/sonatype-nexus-community/nexus-repository-r/issues/21

One thing I’m not sure of, though - is there any other mechanism in CRAN-like repositories that Nexus should be emulating to discover other versions of packages than the highest-numbered version? The archive.rds file seems kind of clunky, needing to download the entire file - for all packages in the repo - just to query for one package. And an RDS file is going to be a bit unpleasant to generate in the Nexus code, because it’s a Java app that doesn’t have R.

Anyone in this forum thought about this issue before?


#2

I am not entirely sure what your question is here, but here are some thoughts.

R is completely fine with having multiple versions of the same package in the repository, these can even be in the same directory, or in different directories, using a Path entry in PACKAGES.

CRAN does use this occasionally, although not very often. I suspect the reason for not using more often it is that they don’t want to support (and test!) multiple versions of a package. But e.g. right now, PACKAGES has multiple builds (of the same version!) of the recommended packages:

Package: Matrix
Version: 1.2-12
Priority: recommended
Depends: R (>= 3.0.1)
Imports: methods, graphics, grid, stats, utils, lattice
Suggests: expm, MASS
Enhances: MatrixModels, graph, SparseM, sfsmisc
License: GPL (>= 2) | file LICENCE
NeedsCompilation: yes

Package: Matrix
Version: 1.2-12
Priority: recommended
Depends: R (>= 3.5)
Imports: methods, graphics, grid, stats, utils, lattice
Suggests: expm, MASS
Enhances: MatrixModels, graph, SparseM, sfsmisc
License: GPL (>= 2) | file LICENCE
MD5sum: 7b223434ec50b0f6f75ce4fa3dc080e5
NeedsCompilation: yes
Path: 3.5.0/Recommended

available.packages has default filters that select the appropriate version for the current platform. If multiple versions are appropriate, then the latest one is selected. See ?available.packages for more about filters.

As for CRAN-like repositories having an Archive directory and an archive.rds file, this is not required, and AFAIK CRAN is the only repository that has this. (I.e. e.g. BioConducor does not.) The archive.rds file is currently only used for R CMD check checks by CRAN. Some user spaces packages also use it, e.g. devtools::install_version().

So, if you want to support multiple versions of your own packages, I would say that you can just add all supported versions to the main repository (i.e. not in Archive). Unfortunately, install.packages and available.packages does not really give you the tools to select the version you want to install. But they are at least extensible, and maybe you can write a filter to available.packages and a wrapper to install.packages that makes it easier to select the desired version.

If R can select the correct version, based on the requirements of the packages (i.e. R entry in the Depends field), then you don’t need to do anything, install.packages and available.packages will just work.


#3

Thanks for the info @Gabor, I was unaware of the Path entry. It looks like Nexus doesn’t include Path in its PACKAGES.gz matrix, so adding that would be a necessary step toward letting people choose which published version they want to install.

Is the Path entry documented somewhere that I could read? I see a little bit in the writePACKAGES docs - is that basically the extent of it?


#4

So, a follow-up question - when Path is set in the PACKAGES.gz file, how does a client go about discovering which packages are actually present in the repository?

For example, suppose a client wants to install package XPack with version number greater than 0.5 and less than 1.0. Version 2.0 and 0.7 are published on the repository. How does the client find out that version 0.7 exists? Does it have to be explicitly listed in the PACKAGES.gz file, or does the repository have to support directory listings, or something else?


#5

PACKAGES.gz is the package database, so whatever package is included there, must be also present, in the same directory as the PACKAGES.gz file itself, or if Path is used, at the specified Path.

In your example, yes, 0.7 must be in the PACKAGES* file(s).

Btw. a dependency with an upper version limit is usually not a good idea, because these can easily cause unresolvable requirements, since R currently cannot load multiple copies of the same package.