I’ve experimented using Google Drive and GitHub with my team (a small ecological research team) for version control and collaboration. I’ve found that both have there uses and I’m keen to share how I’ve been doing it so that I can hear from others how they are doing things, and whether I’m on the right track.
I initially started off committing everything I worked on to Github in different sub folders in the same repo. All of my internal analyses that aren’t meant for a public report or peer reviewed paper went into different folders in the same general ‘internal’ private repo. This worked all right when it was just me using the repo. But when I brought a co-worker into the mix, we soon realized what a pain it actually is to try to collaborate on GitHUB on a day to day basis. We were spending a load of time messing around with merge conflicts and all sorts of other un-intuitive issues. We felt GitHUB was cumbersome for day to day analysis collaboration internally.
So now I would like to move back to simply using Google Drive for internal analyses. Google drive is great for version controlling (especially now that you can ‘name versions’ in Google Drive similar to a GitHUB commit). I sometimes rely on the revision history of Google Drive to actually roll back a script, because it’s way more intuitive than doing that in Git not to mention that every time you save your script in, it gets an un-named version in Google Drive, so the chances of not losing your work is actually greater using Google Drive. Google Drive allows you share all the files you and data you need, and using the
here() package we shouldn’t have to worry about working directories.
I think GitHUB is useful for presenting analysis in an open science context for public communication artifacts, whether thats a paper, a poster, a presentation, or a dashboard. Using the fork and pull method, external collaborators or reviewers can fork your repository, make changes, and send you a pull request. In all likelihood that probably wont happen that much. The biggest benefits of putting my analysis into public repos on GitHUB is that it adds an additional level of peer review, it shows there’s nothing to hide in my analytical methods, other people can build off my work.
So in summary:
For internal analysis: We use Google drive combined with R-Studio projects and the
herepackage and load data from within our google drive folder. Doing this we can achieve easier collaboration and maximal portability of scripts.
We use GitHUB for public analyses related to science communication artifacts. Scripts there load data from citable data packages or our internal database via an api.
Does anyone else use a similar workflow? Are there any disadvantages to this that I may not see?