I'm dreaming of a (hosted) RStudio CI/CD

docker
dependencies
travis-ci
shinyappsio
shiny
#7

thanks for taking the time to circle back to this @cole and apologies if my comments are a little all over the map.
Also, disclaimer: I gave up on packrat 18 months ago or so, so I am not up to speed with what the package can do.

The deployment process via rsconnect::deploy... works fine, no qualms here (though I would like to have every git commit automatically deployed).
My qualms are indeed with the fact via the dark magic that is packrat, rsconnect::deploy... adds another orthogonal layer of specifying dependencies (on top of DESCRIPTION).

Another recent example to illustrate (hope I got this right): I had a commit which had some package_foo::bar() call, but had never Importsed it in DESCRIPTION.
Building on Travis worked, because said code was never called in my build script.
But them, oddly, rsconnect::deploy() (from Travis to shinyapps.io) failed, because it couldn't find package_foo to add to the manifest.
At first I was confused, then I realised that this was b/c under the hood, packrat was apparently parsing the entire source for dependencies, and then got mad, because it couldn't find the package in the Travis session.

This is not a bug and I was able to fix this quickly, but these things add up, and sometimes I take a long time to figure them out. Seems to me like DESCRIPTION + packrat + x is too much.

I'd really love to standardise on one way to express dependencies, even if that's at a cost.

To me, DESCRIPTION is the most attractive way, because it's already used in two prominent use cases (pkg dev and travis CI/CD), and it covers a decent amount of edge cases (via github remote).

So, my (probably ill-informed?) wish list right now would be:

  1. let me disable packrat in rsconnect::deploy(), and just let me bring my own DESCRIPTION for shiny apps, even if it's not a package (just as on travis ci/cd).
  2. merge/standardize the Travis CI/CD docker image with the docker images behind shinyapps.io, and let me use those everywhere and easily (locally within RStudio, on Travis CI/CD, on shinyapps.io).
  3. standardize on https://github.com/rstudio/shinyapps-package-dependencies as a way to specify system dependencies.

Finally, I'd love to pay for services around this. RStudio Connect meets Travis CI meets Heroku kind of thing.

Still happy to talk about this if you, @tareef and others are still interested.

1 Like

#8

Looked at from a distance, I now think I'm really starting from two quite separate pain points:

  1. packrat (inside rsconnect::deploy()) + DESCRIPTION is too much. I'd like to just use DESCRIPTION. DESCRIPTION is also human-readable, and just overall simpler than packrat.
  2. I'm having a hard time reconciling a Github-based CI/CD workflow with shinyapps.io and especially RStudio Connect. So I go through Travis in between, which is a subtly different build environment.
2 Likes

#9

As an aside but related note I would love to have connect commit my code to git on every deploy!

0 Likes

#10

These are great thoughts! Thanks so much for the thorough post. I can definitely resonate with DESCRIPTION and packrat feeling redundant, and then difficult to debug because the same information is essentially being stored in two places.

One thing that would help me clarify a bit - are you building a package? How are you using the DESCRIPTION file if not when building a package? Or are you building the package and then deploying something to do with the package?

Usually DESCRIPTION files are used to manage minimum viable dependencies of an R package (not necessarily an arbitrary R script). Further, the reproducibility of a DESCRIPTION file is limited because it only points at the latest version of a package. For instance, your Travis CI build using DESCRIPTION could fail if your deploy picks up a different version of a package than you used in dev (because CRAN or GitHub got updated). Further, DESCRIPTION is usually used for the minimum set of packages required to use a package's exported functionality (i.e. dependencies) in contrast to the minimum set of packages required to develop a package (i.e. build dependencies or developer dependencies or something). When using packrat and DESCRIPTION together, this distinction is usually how I keep myself sane (packrat for build dependencies, DESCRIPTION for use-able dependencies).

RStudio Connect / shinyapps.io use packrat, which locks down a specific package version and repository to be sure that the correct version of the package is installed (even if a newer version is released). Further, it flexibly allows installing a specific GitHub commit hash and newer/older versions of packages (checkpoint requires that you pick a date in CRAN history - hopefully that has all of the versions you want). That said, I know that the UX can be pretty clunky. My hope is that packrat or some replacement will ultimately mesh the vision with the user experience and allow for the type of single-source-of-version-truth that you are looking for.

I definitely hear you loud and clear on the CI/CD workflow being tricky to marshal with shinyapps.io and RStudio Connect. We are hopeful of improving that process in the near future, so I will make a note to let you know when we can improve that story!

A few follow-up questions:

  • Do you like just grabbing the latest version of all packages? Would you expect breakages/errors from picking up new versions on deploy? Would you be amenable to committing a file in your GitHub repository that specifies which package versions you are using locally for development?
  • Are you familiar with GitHub hooks? Would you be comfortable with using a GitHub hook to drive deployment and bypass the Travis CI build server?

I definitely love packrat for its vision and have become accustomed to some of the clunky UX. One piece you might look at using, if you are interested in tracking exact package versions over time is packrat::.snapshotImpl(".", snapshot.sources = FALSE) which just creates a packrat.lock file by parsing your code without any other "magic." Obviously, restoring an environment from that .lock file will require using more of packrat's shenanigans, but this can be a useful way to see the actual dependencies of your parsed code in a concise fashion!

2 Likes

#11

interesting, never thought about it with that direction in mind.
I always thought about committing to github triggering a deploy to RStudio connect.

Perhaps, committing from an RStudio Connect deploy would only reliably work if you're working alone, because if you're working together, there could be merge conflicts.

I might be wrong, but I think the usual way for deployment platforms (such as RStudio Connect, or Heroku or what have you) is to deploy from git commits, not the other way around.

0 Likes

#12

Yes, I agree. I'd be curious to hear more about why @iain . Does deployment feel easier to do than commit? Are you looking for tying a specific version of deployed content to a GitHub commit (which the other approach would also allow)? Are you using git only locally and not with a server counterpart?

0 Likes

#13

Sure - the main reason is that deploy is my current workflow. Committing and then deploying is an extra step and has the potential to be out of sync if the deployment fails. If connect could commit on successful deploy it would mean that everyone using our connect server would be using good code practice with no extra work on their part which is a huge win!

0 Likes

#14

Agreed it might not be best practice for large teams but most of our shiny work is individual developers

Issue with github triggering deploy is that you have to have the infrastructure in place to allow it (ie own the CI system) and also have testing set up for the shiny app which is nontrivial

1 Like

#15

@cole some more details:

re: why DESCRIPTION

I'm kinda used to always having a DESCRIPTION around in any given project, because I always use Travis CI/CD.
I also use it in pkg dev, but mostly it's to talk to Travis CI.

re: DESCRIPTION vs. packrat.lock

packrat for build dependencies, DESCRIPTION for use-able dependencies

That's a great way to distinguish the logic and purpose of the two tools, thanks @cole!

I understand that packrat is vastly more powerful than DESCRIPTION -- really, it's tool for a different purpose, as you described.

I guess my fundamental hangup with packrat was always that it (programmatically) wrote out a (human-readable) packrat.lock file (if I remember correctly?), and that this needed to be committed, even though it was kinda derivative.
Not derivative in the way of man/ from roxygen2, but derivate from past install_x() calls, which packrat would magically track.
I'm a bit of a clean commit freak, and it freaks me out when there's stuff in my commit, which I haven't really written myself.

I guess the bigger point (aside from my commit gripe) is that in terms of UX, I prefer a dependency tool where I explicitly, and manually have to state dependencies.
To me, this makes it a lot easier to version-control, and, more importantly, reason about unexpected behavior.

(Committing the sources, as packrat requires, is also a concern).

I better understand now that while DESCRIPTION gives me that control-freak UX I like, it's not, per se, the right tool for the job.

follow-ups

  • Do you like just grabbing the latest version of all packages? Would you expect breakages/errors from picking up new versions on deploy? Would you be amenable to committing a file in your GitHub repository that specifies which package versions you are using locally for development?

For now, I get around this problem by using the Remotes: field in DESCRIPTION quite a lot, referring to individual releases or even commit hashes.

The problem in that scenario is my local machine, where I sometimes, lacking the graces of packrat, have to run a lot of devtools::install_github(ref = ...) to get my library in the proper state.

Are you familiar with GitHub hooks? Would you be comfortable with using a GitHub hook to drive deployment and bypass the Travis CI build server?

Great idea, thanks @cole. I'm familiar with the idea, but never worked with gh hooks so far.

Admittedly, I'm also a bit partial to the ecosystem around Travis CI and I actually use it in all my projects, even outside of pkg dev.
For example, I'll run shinytest on Travis against some app.
Or I'll deploy some static website from some *.Rmd from Travis.
So, unless RStudio would offer all these things (which might just be crazy scope creep), I might have a hard time weaning myself off of Travis.

I definitely love packrat for its vision and have become accustomed to some of the clunky UX. One piece you might look at using, if you are interested in tracking exact package versions over time is packrat::.snapshotImpl(".", snapshot.sources = FALSE) which just creates a packrat.lock file by parsing your code without any other "magic."

This is a great idea, I don't think I was aware of this when I last used packrat.

1 Like

#16

One more question @cole: Travis CI supports a bunch of deployment providers which I've found quite useful for GH pages, and Google Firebase etc.
They don't, afaik, do anything that you couldn't do with a custom bash script, and wouldn't address any of the above issues, but they just remove a little bit of friction and give users a clear path to deployment.

Is this something that RStudio might be interested in supporting? shinyapps.io, cloud.rstudio or even self-hosted products as deploy partners from travis?

I've considered just trying to hack this for myself, though I'm not familiar with the tooling and I guess the idea is that the service provider (RStudio) writes the deploy script, not some wannabe hacker :slight_smile:.

0 Likes

#17

Very interesting thoughts. Regarding your well-articulated concerns for packrat:

I think this is a fantastic point. You are rightfully bothered by packrat's auto-snapshot feature and the other magic that happens behind the scenes. I, for one, never commit the packrat/src folder. I'm with you - committing sources makes no sense. There is a packrat option to gitignore packrat source that I always enable. The only things I commit are packrat.lock and packrat.opts. Setting auto.snapshot = FALSE will ensure that only snapshots you trigger will happen (i.e. no dirty commits).

Further, using .snapshotImpl(snapshot.sources = FALSE) will enable you to use the classic R library (.libPaths()) without the libpath magic of packrat. Again, the only updates to packrat.lock would happen when you .snapshotImpl. Granted, you will have to opt into the libpath magic if you ever want to restore the environment with packrat::restore(), but this could make for a decent dev workflow, and then again it is possible to edit the packrat.lock manually if you wanted to go that route too. (I explore questions about packrat ad nauseum, so apologies for the long response and feel free to read more).

Regarding deployment, the only piece it seems you are really accomplishing on Travis is deciding exact package versions (again you just pull the latest versions). I guess my point is that for deployment, what if you were able to commit a single file (i.e. something like packrat.lock) and then use that to deploy to Connect / shinyapps.io. In such a case, the Travis CI build would be redundant (unless you are running a suite of tests on a pre-deployed something or other) - the Connect server / shinyapps.io could take care of the CI environment reproduction.

In related fashion, I think this would mean that a deployment provider from Travis would likewise be a little redundant. I.e. it would be important to determine what we are getting from a Travis build that we could not get from the repo directly. If a sufficient packrat.lock-y file was provided in the repo, we would not need anything that the Travis build would provide. Thanks for sharing about those, though, as I was unaware - I am definitely interested to check into this stuff a bit deeper!

On the other hand, running shinytest or other things on Travis CI sounds like a really interesting blog post! I'm not sure if you have a blog of your own, or if you would be interested in submitting to RViews, but I for one would expect some interest there!

2 Likes

#18

Yes, I can see where you're coming from. Many R users are not familiar with git. I think there may be some ways to the solutions you envision without having Connect do the committing. I'll have to think on this a bit more, but I am hopeful that there may be a way to address making the commit process an easy part of a developer's workflow, as well as deploying without needing a formal CI system, shiny app testing, etc.

0 Likes

#19

Great!

I would be interested to learn more why it wouldn't be a good idea to have an extra flag in the deploy command that will tell Connect to do the commit after a successful deploy based on the git configuration in the project.

0 Likes

#20

I think there are several reasons. The short of the matter is that the .git folder is not published to the Connect server (and there is really no reason to include it). It would potentially make deployments much bigger, it would add a dependency to Connect (requiring git to be installed), and then even if Connect did have the ability to make a commit, there is no notion of a commit message, no way to get Connect to push the commit out to a server, no way for the user to control the commit behavior (git users typically have conventions around their commits), and, as was mentioned earlier, there would be version conflicts across users.

I think that brain dump is only scratching the surface of the challenges associated with such a feature. Basically, it would very much complicate the client / server interaction. It makes way more sense to do the commit on the client before / after deployment. All that you need then is a way of triggering a commit locally (there are R packages for that) and a tie into either the deployment process or RStudio to prompt the user for a commit message. This sounds like an RStudio Add-In to be honest. I think there is a way to accomplish your desire much more elegantly than the feature that you propose.

2 Likes

#21

Thanks for taking the time to educate me on the complications - it is really informative. I agree, I think an RStudio Add-In would solve me request elegantly.

1 Like

#22

Hello everyone,

Hope I'm not late to the party. Great stuff, Max covered most of the pain points that we're trying to tackle in our corporate environment. We're working on building a CI/CD workflow for R packages (as is everyone, I guess) which, simplified would look like:

Clone a git repository -> Install dependencies -> Run tests -> Build a package -> Upload to Artifactory -> Deploy

Spent some time reading the comments and I just want to emphasize that we would really love to standardize the way we express dependencies. I like the approach of Travis CI, where dependencies are defined in the description file, and additional dependencies are defined in travis.yml. Ruby Gemfile is another example, where they have dependency groups which clearly states which dependencies are required when you're developing locally versus production requirements.

Looking forward for some feedback (or setting up a call)

3 Likes

#23

glad to hear someone else is sharing (some of) these concerns @amarmesic.
I'm also still up for anything to explore this further.

1 Like

#24

As far as packages and CI / CD are concerned, I think some valuable thoughts / discussion can be had in these topics:

I think the latter makes a point about when would you clone the git repository / kick things off? Would you be looking for a Git hook that you control, or would you be looking for a service that would be watching the Git repo for tags / commits on a certain branch to trigger a build? You also might think about a timeline - maybe kick off a build / test every week even if no commits have made it to master?

0 Likes

#25

Thank you @cole! We would be looking a hook that we control, in our case it's BitBucket and Jenkins. Nightly builds is something that we have adopted, making sure master stays as clean as possible.

What we (@iain and myself) are more interested at this point is npm / pip like CLI solution for R:

  • you specify all your dependencies in .yaml
  • run the tool and the magic happens, you don't care if devtools depends on roxygen2 or any other package but you have all your dependencies resolved
  • run tests (again, they should not fail if 'testthat' is not installed)
  • deploy to artifactory

I'm not sure if RStudio Package Manager is addressing these challenges (or has plans to be npm/pip for R), but it would be perfect if it does.

We'll evaluate RSPM and share some feedback soon.

Thanks,
Amar

1 Like

#26

For people who are still interested in a more streamlined CI/CD experience, the new GitHub actions may be interesting.
It's all build on docker containers, and as such allows you develop in the same container that you're also using for CI/CD.
(I guess this was technically possible with Travis as well, though not as flexible).

In my experience, this has simplified my CI/CD workflow greatly.

I've developed some custom R actions for GitHub actions, and an accompanying ghactions package:

Announcement is here:

3 Likes