Strategies for Rendered Notebooks on Private GitHub Repos

teaching

#1

Do you commit RMarkdown output documents to GitHub? What RMarkdown format do you use for work in private GitHub repositories?

In both teaching and research, I tell my students to put their work in a GitHub repo and to use github_document as the RMarkdown output, and commit both the .Rmd, source and the .md and *_files/ output to GitHub. This is convenient in that it allows me to see the results without rendering the .Rmd locally (which can be slow or fail, particularly on old versions), and gives me a version history on the output so I can dig back for older versions of figures etc if necessary.

However, this is also frustrating in that:
a) there is no support for mathjax equations, leaflet maps, or other javascript elements (though I think there are some clever work-arounds here that could be better automated?)
b) it clutters the github repo; outside visitors are probably not aware of the differences between .Rmd and .md versions
c) github_document is not a default or even top-level option in RStudio

HTML-based output have obvious advantages, but do not display on GitHub (for public repos we can use something like htmlpreview.github.com, but not private). I’d be tempted to use Blogdown (as I do for my own research notebook) as a way of keeping the .Rmd input vs output content nicely separated and cleanly presented, but this is a bit heavy and again doesn’t have a viable option for private viewing online.

Other probably-overengineered approaches would be something like having Travis-CI render the .Rmds and push the output to some other private destination where I could view the output independent of the source repos.

Alternately one could simply have students commit .pdf output, but I’m reluctant to encourage committing binary files of things that keep changing. I find this better for the ‘final copy’ document, but maybe should just stick with pdf since at least it renders on GitHub. Other minor improvements would just be any way to have output content wind up in a different directory than input files – I know this is possible with knitr options but in my experience messing with output paths in RMarkdown usually makes things fragile / breakable.

Anyway, would love to hear how others are approaching this.


#2

In case you haven’t seen it already, I follow @jennybryan’s great advice in this regard which is similar to your current approach (see section on repo-browsability “21 Make a GitHub repo browsable”):

I’m not sure about a good solution when the document contents are more complex though…


#3

I am all in on github_document, but have felt many of the downsides you mention @cboettig. I just don’t know what else to do that provides privacy (which we MUST have) and doesn’t have unrealistic requirements for instructor, student, or both.

The “clutter” issue doesn’t bother me. In fact, I’m the opposite: I get aggravated when people want me to consume something, but expect me to git clone, install all dependencies, then run or render locally. I feel it’s fruitful in these settings to just let go of some of these taboos re: ignoring downstream products or quarantining them. I’ve become very pragmatic.


#4

Thanks @jennybryan, you reassure me at least that I’m not recommending something different than community.

Completely agree with you on the importance of not having to clone and re-run, that just doesn’t scale. I’m all for committing output products, but I sometimes wish they’d all show up in a separate directory called output/ or something. (Also wish GitHub would just stop rendering the .Rmd version, so I could deep-link individual lines etc! It’s not a jupyter notebook for goodness sake!)

Also, my students often forget to commit the .md and/or the image files in *_files/, (or when they do they do so on separate machines and immediately get version conflicts). Maybe I can add a check for the output files to my course .travis.yml scripts…


#5

I also use github_document for the reasons @cboettig and @jennybryan listed. I’ve also had a few instances, earlier in the semester, of students forgetting to commit images but I instruct them to clear their git pane in RStudio, which necessarily requires that these files are committed and pushed. It only took one instance of losing points for incomplete documents for students to remember to do this, and I feel like this is a good lesson to learn. I think the git pane (as opposed to having to remember to check git status) makes this easier.


#6

Yes! I have recently come to the realization that ‘the git pane should be clean’ / ‘think of the git pane as to-do list’ as a great way of thinking and really need to remember to emphasize that to my students. (I also found docking points for missing images or failing travis tests had immediate improvement; though we let them resubmit for that assignment).


#7

One other thing that is helpful for remembering to commit and push all associated files is teamwork – it’s much less likely that all 4 students forget to do this than just 1. So having a few team mini team assignments (if that works with the structure of your course) early on can help as well.