Determining which version of R to depend on


#1

Recently I asked about this very topic on Twitter, which sparked the creation of a neat shiny app by Andy Teucher which checks the R dependency of each package imported/suggested by a certain package.
@hadley noted uncertainty regarding this approach, so I would like to discuss this further.

My initial question was more concerned with the “direct” dependency on a specific version of base R, which I am entirely unsure how to determine, since my primary package makes use of various functions from base, stats and methods, but I wouldn’t know where to start when someone were to ask me which minimal R version I rely on. Checking each imported package’s R dependency seems reasonable, but it doesn’t actually factor in where the actual dependency on R comes in to play.

I am aware of travis ci, which I already use to verify my package is working on release and devel, but when it comes to checking older versions, travis falls short because packages I depend on can’t be installed due to a higher R version dependency.

The question is then: Ignoring package dependencies, what is the bare minimum R version my package actually relies on, and therefore:
If every package was required to declare an absolute minimum R version instead of just declaring the most recent version for convenience, wouldn’t that improve the R package ecosystem because it would allow many older, possibly strictly managed R installations (maybe on “conservative” OSes like debian or CentOS) to make use of more packages?

Are there devtools-like tools to assess your usage of base R functions with regards to version requirements? Does the backports package need to be considered? Is this question even useful in the grand scheme of things? All I wanted to do was to fill in the Depends field in my package DESCRIPTION responsibly, and all I got was this headache.

Please share your thoughts or experience.


#2

I think this is great question, so look forward to the discussion.

The Tidyverse has a nominal goal to support R >= 3.1, which we are actively working to make true. IME the backports package is very useful and its README is the most succinct summary around of exactly which functions are going to cause you trouble:

But, yes, getting this straightened out for yourself isn’t a complete solution, if packages you depend on declare a higher R version dependency than is truly necessary.


#3

This is a great discussion, thanks for starting it @jemus42. The question I have is there a case where you would depend on an R version lower than the highest used by your dependencies? If so I think I’m missing something as I don’t see how you could ever test that?


#4

A system could be built which checks your package’s use of functions added in various versions of R, however some limitations would apply.

  1. While newly added functions should be in the release notes, there is no compiled list of these functions anywhere (backports a decent start, but I am actually not sure it is exhaustive). It also only goes back to R ~ 3.0. So someone would have to compile this from reading the release notes.
  2. R packages can call functions in a large number of ways, e.g. imported functions from the NAMESPACE, regular function calls foo(), things like do.call("foo"), or more esoteric ways (get("foo")()) etc. It is also possible to call R functions from compiled code. Supporting all of these methods without false positives is non-trivial.
  3. Some additions / changes are in the functions API or behavior rather than an entirely new function, checking for a function usage is again possible but not entirely non-trivial.

Because of these challenges using static analysis can be at best an approximation.

An alternative approach would be to just install a given package (and it’s dependencies) for a given version of R, then running the tests; but this too has drawbacks.

  1. The functions may be used only for certain OS-platforms, so multiple platforms would have to be tested to be sure the version was compatible.
  2. If a package does not have good test / example coverage, the version specific functions may never be called at all.
  3. If you want to test a number of different R versions installing the package + dependencies for each one will become computationally expensive.

Ideally something like this could be implemented in rhub, so you could easily test results across a variety of systems.

One thing we could add to e.g. devtools now would be the ability to override a given version dependency for a package when you install it. You can already do this by manually changing the version of the package before installation. However I hesitate to do so because it may substantially increase and confuse bug reports if people are habitually installing packages on versions of R which cannot actually support them.


#5

This could make sense if you’ve got a dependency that may have an unnecessarily high minimum version of R and you hope it will come down. For example, I’ve heard quanteda brought up twice recently in this context (R ≥ 3.4.0 at the time I write this). For the record, I have no personal expertise here – maybe quanteda really does need R >= 3.4! But maybe it doesn’t and, next time it updates on CRAN, that will come down.

On your end, you could consider moving such a package from Imports to Suggests and putting your calls to it behind the necessary checks for presence and version.

I don’t love solutions where you record one minimum R version in DESCRIPTION (the max of the min’s of all your dependencies), but keep another one (based on your direct usage of base packages)… in your head? In some file of notes? How do you record why your minimum version is what it is and re-examine that periodically? Therefore, I am drawn to any set of tools or habits that helps make the declared minimum version as authentic as possible.


#6

Thanks @jimhester that’s really helpful. I’m probably oversimplying this (and/or missing a key point), but if package a depends on package b, is it the responsibility of the author of a to ensure that the R version required by b is reasonable? We already put a lot of trust in the authors of packages that our packages depend on, and I’m not sure that this is an exception.

And even if the author of a determines that b can run just fine on an older version of R, I would think the best approach would be to talk to the author of b rather than try to work around it.

From a user’s perspective, if package a depends on R 3.2 and b depends on 3.3, the user is going to be surprised when they try to install a on their system with R 3.2 and it fails because b couldn’t be installed


#7

Really good points @jennybryan (I think we were posting at the same time)


#8

This is exactly what started this question for me, as my package tadaatoolbox does not rely on any base R functions mentioned by backports, but rather heavily depends on other packages. One of those packages is DescTools (which I couldn’t find fully on GitHub besides the CRAN mirror). It depends on R >= 3.3.1 at the time of this writing, yet for my package, that dependency seems oddly high, which is why I was wondering if I then have to depend on >= 3.3.1 as well, or just might get away with depending on e.g. >= 3.1. But that wouldn’t really make practical sense in my case anyway, as multiple dependencies rely on >= 3.2, and here I am, wondering what the R dependency even means anymore.

Another factor is the R-devel necessity of depending on non-zero patch levels, e.g. >= 3.3.0 rather than >= 3.3.1, which had me wondering if it would be appropriate to nudge package authors to be more considerate about their R dependency in some way or another.
Anyway, @jimhester confirms my suspicion that this is not a trivial problem, and I’m not just overlooking some easy solution or best practice advice right now, so at least I have that :relieved:


#9

For me all of this begs the question: what is the intended purpose of specifying a minimum R version in a package? Is it the R version required by your use of the functions in the base packages? Or is it the R version required for your package to be installable and functional (ie, get all depenendencies installed)? I suppose I had always assumed the latter, but maybe the former is better (even though likely harder to figure out). It is a clearer separation of concerns as @jennybryan alluded to…


#10

I think I’m the cause of much of this confusion because for a while devtools::create_description() would automatically require the version of R that you called the function for. At the time, my thinking was that you should be conservative and only claim to support the versions that you’ve actually tested on (typically, only the version you’re currently using). But in practice, that is not a good approach because versions of R don’t differ by that much, and most packages will mostly work fine on older versions of R. Explicitly requiring a recent version prohibits people from using an older version, and the net effect is more pain.

Now, I think you should only explicitly require an R version under two circumstances:

  • You are required to by R CMD check

  • You have checked that your package works on multiple versions of R and you want to assert that. (This is now easy on travis)

Unfortunately, however, there’s no way to test if your package works if a dependency has explicitly declared that it needs a more modern version - there’s a major clash between the approach I previously advocated and the approach that I now advocate.

I think that means in most cases, if there is any doubt, you should simply omit the version specification. In the tidyverse we are taking a more aggressive approach (because we have more R package development time/resources available to us), slowly pushing the tested version of R back to 3.1 for all packages. This was a lot of work at first, because (e.g.) to get ggplot2 working on R 3.1 we needed to get every dependency also working, but now that we’ve fixed most of the infrastructure packages, testing with >=3.1 is now typically pretty easy for a new package.


#11

Thanks @hadley. That makes a lot of sense to me. In that way you’re:
a) Not promising something that you haven’t actually tested, and
b) Not (potentially) constraining a package to only work with a higher version of R than it actually can.

I think WRE is actually consistent with this advice, even if not quite as explicitly:

The ‘Depends’ field can also specify a dependence on a certain version of R — e.g., if the package works only with R version 3.0.0 or later, include ‘R (>= 3.0.0)’ in the ‘Depends’ field.


#12

If I’m seeing this right the bottom line is:

  1. Only explicitly depend on an R version if you are confident about that dependency
  2. If checks/tests require a specific version, use it.
  3. If you don’t know any reason to depend on a specific version, omit the R (>= *) from your Depends field.

Would that be a reasonable summary?
In accordance with this, I’ve tried removing my packages Depends field, which doesn’t seem to be that good of an idea since travis checks and my local check yield

 checking data for ASCII and uncompressed saves ... WARNING
  Warning: package needs dependence on R (>= 2.10)

…which might be due to some aspect of the package, like explicitly encoded UTF-8 characters or saved data, so I guess I’ll just use R (>= 2.10) as my dependency then… with hesitant confidence :slight_smile:


#13

This is a good question, to date the packages I make just use the version of R I happened to have had installed at the time of first development, and I leave it at that over future versions until I run into the very problem you describe where a dependency can’t install due to it needing an R version that is greater.

The rhub and Travis checks are mostly concerned with the current version of R and future, but a functionality that lets you find the minimum version of R you can rely on would be super.


#14

For checking different versions of base R, this may be useful:

https://hughjonesd.shinyapps.io/rcheology/