URL Encoding or User Agent Encoding?

We are working on adding support for pre-compiled Linux binaries of R packages in RStudio Package Manager. If you aren't familiar with package binaries this thread probably won't interest you, but imagine a world where installing the tidyverse on a Linux server takes seconds instead of 30-40 minutes!

One of our critical goals is to ensure R users can access these binaries with minimal work, through install.packages. (The same way they would access Windows or Mac binaries from CRAN).

Unfortunately, we can not make changes to the R core function install.packages such that it could request binaries for the client's platform. For context, when R installs a package on Windows, install.packages knows to look for package binaries at CRAN_REPO/bin/windows/ instead of CRAN_REPO/src/contrib.

Short of changing install.packages, our plan is to use RSPM's advanced routing capabilities to serve binaries (or source as fallback) when install.packages makes a request to RSPM_REPO/src/contrib. But we still need a way for the client to tell RSPM what platform and version of R are in-use, so that RSPM can serve the appropriate binary. We are considering two alternatives:

  1. Encoding this information in the RSPM_REPO url. In this scenario, the repo URL would look something like: https:/r-pkgs.company.com/repo_name/__linux__/operating_system_id/latest/src/contrib. The operating_system_id would based on the different RSPM open source build systems, and admins would pick the repo URL best suited for their infrastructure.

  2. Encoding this information in the user agent header. In this scenario, the repo URL would stay the same across all operating systems: https:/r-pkgs.company.com/repo_name/latest/src/contrib, but the UA header would include something like: R 3.6.1 ... rspm_distro: operating_system_id.

Note that in either scenario RSPM parses the user agent header to determine the R version.

Which "encoding" method would be easiest for R admins to set? The easiest for you all to debug and troubleshoot?

A few other interesting notes:

R version < 3.6 include a default user-agent header. R versions after 3.6 do not, and require an admin to set the user agent header by way of the HTTPUserAgent R option.

Newer versions of RStudio set this option by default (though RStudio will respect the option if it is already set elsewhere).

It's probably (way) too late now, but just having stumbled over this, I'd vote for encoding it in the URL.

What tripped me up was that, in part, the client platform did seem to be encoded in the URL as in "https://packagemanager.rstudio.com/all/__linux__/bionic/279" (bionic).
This lead me to tacitly assume that I'd be all set, and it took me a while to compare options() between different images until I found the difference option("HTTPUserAgent").

Of course I should have just read the manual, which is pretty clear on this:

If this diagnostic fails, R should be configured to include the R version and OS in the user agent header:

# Set the default HTTP user agent
options(HTTPUserAgent = sprintf("R/%s R (%s)", getRversion(), paste(getRversion(), R.version$platform, R.version$arch, R.version$os)))

But since that document is the admin guide, and I wasn't administering RSPM, but just using it, I never thought to look.

Perhaps this is another thing that might be surfaced more prominently, such as when the user copies the URL?

I understand that RStudio might be a bit cautious about this, because you can't guarantee that the binaries will work on all, say, any bionic and R 3.6.3 boxes out there, and the (somewhat arcane?) HTTPUserAgent makes sure that it effectively only works on RStudio-sanctioned images.

This also seems to have tripped up RStudio Package Manager 1.1.0 is out now.

Thanks Max, we can look at other ways to surface this requirement, as it is a bit tricky. If you happen to use an R binary from github.com/rstudio/r-builds the User Agent should be set correctly by default

1 Like