What has been the fuss with CRAN lately?

Andrea · May 17, 2018, 9:26am

disclaimer: I don't want to start a , this is just an honest question out of curiosity (I'm as curious as the proverbial cat ), and I think this is the best place where to ask it:

it's clearly out of place on the SE hierarchy
it may be suitable for Twitter or Reddit, but based on my DerpLearning model I estimate the risk of to be at least twice as high as here (also, on Twitter the characters count limit doesn't help)
it is suitable for the R4DS community too (which is great!), but for reasons I decided to post here

After the boring intro, I get to the point: lately I've been hearing rumors about "issues"(?) with some new CRAN policy, but I wasn't able to get more details. I decided not to care, but then I started to notice something weird:

CRAN reprex, who I've been using regularly for months now, doesn't capture the clipboard content anymore (v.0.1.2): I had to use the GitHub version. Nothing major, of course, but since that's not mentioned in the README or in the reference manual of the CRAN version, an R beginner who is not familiar with GitHub issues may get confused
package bigstatsr is not on CRAN anymore. The message says "Archived on 2018-02-02 as usage restrictions in README were incompatible with a FOSS licence.", but reading the README file I couldn't find usage restrictions. Again, no biggie (I'll just keep using bigmemory).
a few other oddities with other packages.

All in all, it doesn't seem a big deal, or at least it hasn't impacted my workflow in any significant way, but it could be an hassle for package maintainers, I guess. So, can anyone enlighten me on what has changed with CRAN recently?

Frank · May 17, 2018, 10:56am

It doesn't sound like any sort of broad CRAN issue, just issues with the packages you mentioned, probably each for their own idiosyncratic reasons that can be investigated separately..? I guess R 3.5 introduced some new features that may require more package changes than the typical release (but I haven't upgraded myself...).

In the case of reprex, I heard about this months ago. Looks like they're in the negotiating phase:

I hope they work that out, since I think the package has improved communication on SO and here (even though I don't use it myself .. yet).

I've never heard of bigstatr, but asking in github (as I see you've done) seems like the best way to find out.

Huh, figured r4ds was also covered by this forum, though now that I look I do not see a tag for it..?

Andrea · May 17, 2018, 11:59am

Hi,

re: bigstatsr, yes, I've asked there, but since two coincidences are a proof (or was it three? ) , I was wondering if there was some bigger issue behind. I've been using R for a few years now, and I've never seen so many packages on CRAN removed or having some functionality crippled with respect to their version on GitHub. It may well be confirmation bias, though, or just that the change to 3.5.0 had a bigger impact than others. Now that you make me think of it, I believe I recall something similar happening some years ago, in correspondence of some other major release.

Ha, ha, I was just mentioning the great R4DS community, which is a Slack community created by Jesse Maegan and unrelated to this forum (which is also great ) I guess Hadley's book can be discussed in this forum too, as long as the question is asked in an appropriate channel.

hughparsonage · May 17, 2018, 3:04pm

Just guessing, but the issue with bigstatsr appears to be that the code of conduct could have been interpreted as binding CRAN and the package's users, which is obviously contrary to free software.

Andrea · May 17, 2018, 5:14pm

Yep, that's it, apparently, even though I couldn't find this code of conduct in the README file. Anyway, it looks like there's nothing wrong with CRAN. false alarm

JohnMount · May 17, 2018, 6:55pm

Not claiming anybody is doing this but: I think we all must be very careful in ascribing all rumors of interaction difficulties to CRAN. I have also heard some negative rumors, but they didn't match my limited direct experience (which has been positive). So I personally discount the rumors, but would not discount any concrete examples (good or bad). If the rumors match somebody else's direct experience I don't mean to undermine that, and I certainly do not intend to claim "just because nothing bad has happened to me it is implausible something bad has happened to someone else."

Most conflicts involve more than one party who may be at fault. I am not meaning to diminish anyone else's direct experience, but I do want to diminish the indirect repeated rumors that CRAN is fundamentally hard to work with. I feel that CRAN is a great benefit, and much better than not having a central package facility of having a central facility run by a company.

I am most definitely not commenting on reprex or bigstatsr, as I don't know anything about their interaction with CRAN. And you are correct in that package state does seem mysterious (I follow other orphaned and removed packages for which I would love to know what happened).

Below I am commenting on other packages in general (though I do have specific examples to back it up).

I think in many cases the package maintainers have been getting away with not obeying even simple CRAN guidelines that have been known for some time. That helps make it seem uneven (and even seem unfair) when one gets caught by one of these guidelines. My personal experience (maintaining multiple packages for multiple years) is: things are easier if you make an extra effort to follow your best guess of a broadening of the guidelines, even if it appears to you other packages have no such requirement. It is not a good idea to "get lawyerly" and attempt to follow the guidelines minimally to the letter.

Each time a package is submitted the submitter checks the following boxes on the user submission form.

In addition to this, the rules say you are supposed to check current versions of packages that depend on your package.

Given the above I have seen popular packages:

Not fixing the errors seen on the results pages. Given the checkbox this means: when you submit not having fixed errors, not only have you violated guidance- but you have also affirmatively claimed to follow it.
Add to their CRAN submission notes they are breaking their own dependent packages. This causes needless trouble, as the same authors could first fix the dependent package and not need to ask for an exception.
Claim they have notified dependent packages of breaking changes (when it later comes out they have not). This is particularly nasty because not responding to such messages is one of the quicker ways to get orphaned. So claiming another package is ignoring messages is transferring a harm/risk to that dependent package.

Again this is not to kill discussion, it is just my side of the discussion. I think direct sharing of experiences with CRAN (be they good or bad) is useful. The above (unfortunately, including the bullet points) is some of what I have directly seen both in maintaining my own packages, and interacting with other packages.

Sorry that got long, I promise not to swoop in and debate other comments. Interested to hear what other's have to say on the topic.

nutterb · May 17, 2018, 9:01pm

I've had some unpleasant interactions with CRAN (I want a button that says "I got Ripley'd"), but on reflection, almost all of those unpleasant interactions were a result of either a) not understanding/following CRAN's guidelines, or b) writing crappy code.

More commonly, my biggest weakness is I check that box that I've read the CRAN policies--but what I really mean is that I read the CRAN policies in 2010. I don't keep up with them nearly as well as I should. I suspect there's a lot of frustration that comes from this kind of issue, especially when we feel like "we weren't asked to do this before."

Ultimately, CRAN is providing a free service of hosting an impossibly complex archive of packages (including a lot of duplicated work and bad code--a lot of it mine). As it grows, by necessity, it will be come more judicious and demanding of what it accepts, lest it become useless.

But to give credit to CRAN, I've received feedback as detailed as noting discrepancies between my source code and the claims in my vignettes. The feedback and effort CRAN puts into its content really is extraordinary when compared to the volume of work they take on.

With respect to code of conduct files, it seems like it would be reasonable to include the code of conduct file in .Rbuildignore. That way, it isn't submitted to CRAN and hosted on the archive, but is available to collaborators on your GitWhatever repository.

Andrea · May 18, 2018, 9:17am

JohnMount:

Not claiming anybody is doing this but: I think we all must be very careful in ascribing all rumors of interaction difficulties to CRAN. I have also heard some negative rumors, but they didn't match my limited direct experience (which has been positive). So I personally discount the rumors, but would not discount any concrete examples (good or bad). If the rumors match somebody else's direct experience I don't mean to undermine that, and I certainly do not intend to claim "just because nothing bad has happened to me it is implausible something bad has happened to someone else."

Most conflicts involve more than one party who may be at fault. I am not meaning to diminish anyone else's direct experience, but I do want to diminish the indirect repeated rumors that CRAN is fundamentally hard to work with. I feel that CRAN is a great benefit, and much better than not having a central package facility of having a central facility run by a company.

Hi, @JohnMount, I agree on all you said, and for the sake of clarity let me explain my position in more detail. I'm not saying that the CRAN repository is a bad thing, or "hard to work with". On the contrary, it's one of the best things about R. I can't find the quote anymore, but sometime ago I read someone saying that "the CRAN repository is the only example in the Open Source community of a repository of more than 10000 packages, depending on each other, which allows complete beginners to install packages effortlessly on hundreds of different systems" (Windows laptops behind a proxy requiring authentication, Windows laptops not behind a proxy, Mac OS, Linux, servers, supercomputers, etc.). Compare that with the conda experience from behind a proxy, and you'll see.

Just yesterday, I installed a package from CRAN, containing some new experimental methods, which allowed me to use the multiple cores of my laptop to fit a Gaussian Process with a sample size N=100000. I didn't have to install OpenMP, to compile anything, nor I got any issues such as segmentation faults, aborted R sessions, etc. If that's not an astounding achievement, then I don't know what could qualify as one. If the price for having this kind of seamless integration is that CRAN enforces some coding/building polices, then I think we can all agree it's a price worth paying.

What I was just saying is that recently (so, it's just a matter of months: for sure they were not talking about something happened last year) I read a few threads on Twitter/heard a few people talking in an unusually hostile way about CRAN, for what it regards a few packages. I had the vague impression that some CRAN policy had recently changed, and that had caused some reaction, but I didn't ask more details because I didn't want to raise discussion in environments where people can easily misunderstand you or get polarized on issues. Thus I asked a clarification here, where it's easy to have a level-headed discussion.

My conclusion is that they were just some unfounded rumors, as I suspected. Thanks for confirming that!