RStudio Connect and private CRAN timing issue?

I think I've just seen an edge case using our local CRAN repo.

Essentially we're trying to use some CI and we're trying to use a structure similar to the https://github.com/sol-eng/bike_predict sample.

Because of this we often see these things happen very close in time together:

  • package builds and deploys (including version increments) in our CRAN
  • manifest.json updates picked up in RStudio Connect

During one update today this caused an error during RStudio Connect git shiny app update of:

curl: HTTP 404 https://cran.mycorp.app/src/contrib/ourPackage_0.2.3.tar.gz

which was caused by the fact that ourPackage_0.2.3.tar.gz had (during the 10 or so minutes that RStudio Connect was taking to build packages) been replaced by ourPackage_0.2.4.tar.gz (0.2.3 was now in the archive on the cran server)

We are looking at sorting out some of our CI versioning pain anyway here...

but I'm also wondering if we can somehow get RStudio Connect to fallback to looking in https://cran.mycorp.app/src/contrib/Archive/ if a 404 occurs in the contrib folder?

1 Like

Again apologies for the delay here @slodge!

Are you still running into this issue? Connect definitely should be looking in the src/contrib/Archive when / if a 404 occurs in the contrib folder... It depends heavily on the "CRAN-like" convention for naming and such, though - how do you maintain this internal CRAN server? Is it manual?

We're not hitting this any more - I rewired out CRAN folders so that we never delete any of the old packages. Our src/contrib is now a mess - but it works....

1 Like

Oh no!! Everything is dumped into src/contrib? :sob: So sorry for your trouble - that's super ugly!

Are you defining a PACKAGES file to point at the latest, then, I presume?

Yes - we already had code to update PACKAGES (built on top of minicran type functionality) and we also maintain an archive folder for each package (along with an index in src/contrib/Meta)

All we've really changed now is to remove the code that deletes from the src/contrib copy when a new package version is available.

Interesting!! Do you have the tree command installed - any chance you can produce a filtered tree for a given package from the root of your webserver, so that we might have the ability to reproduce this type of issue? It is certainly concerning to me that it was not finding the Archive!

Maybe something like this for one of your packages?

tree -P '*ourPackage*' --prune

We may not be able to reproduce, but it'd be great to ensure that something is not going awry! One issue that we have seen systems have before using this pattern is directory or other filesystem permissions causing an issue on the fileserver for different folders.

My favorite way to test file permissions:

namei -l /full/path/from/root/of/the/os/to/the/file.txt

Or reproducing using curl to access the URL that Connect was trying to read (in the Archive).

I suspect this is pretty low priority for you now, since you have things working, but maybe this will be useful in the future (for you or others). Please let us know if you run into any more issues!! Apologies for the trouble here!

I don't think tree or a static file dump would help here.

The problem back in my post was genuinely a timing one - it was caused by the connect server reading the PACKAGES file early in the deployment, but because C++ packages can take a long time to build, then by the time the server got to installing some packages then our CI system had already moved the version on (this was especially common in our test/dev branches where people are continually checking in fixes)

This was definitely an "edge case" and not that important... so I don't feel the need to chase it down too hard.

1 Like

Ahhhh ok. I didn't grok the timing part. Thanks for the details!! Very glad to hear you have something working for you :smile:

A related-ish question - are y'all familiar with RStudio Package Manager? Would it be helpful if RStudio Package Manager had the ability to build and serve your own private, binary packages? That is a feature we have tossed around, and I would love to have any feedback on the usefulness therein or any related features that would be seen as a necessity alongside such functionality (auditing, logging, monitoring, etc.)?

It popped into my mind since you mentioned building C++ packages :smile: