So, I have been using RMarkdown for my analysis code for some time and I really enjoy it. My workflow has typically consisted of a single large Rmd file which is broken out into sections, each of which is responsible for different portions of the code. One hickup that I have continually run into was the fact that my importing and data cleanup code simply takes too long to process and I result in saving the output into a file which can be loaded back in. [I know about caching, but I just feel better about saving the data to a file that vanilla R can read]. Since I have already processed those chunks and saved the data, I need to find an efficient method to limit execution of those chunks. I realize that I can simply put eval=FALSE in the chunk options, but I would have to do that for a lot of code chunks and it would take me a lot of time. My proposed solution… Chunk tagging with option over-rides.
From the end users perspective, when authoring a code chunk, there would be a new chunk option called “tags”. This would be a comma-separated list of strings. Examples would be “Import”, “Cleanup”, “Modeling”, … Then at the top of the document, there would be a YAML block where each tag can have a set of chunk options listed which would over-ride the chunk options specified at the chunk. If a chunk has multiple tags assigned the first override identified in the overrides YAML chunk would be authoritative. There would, of course, be room for debate on that, not sure if that was the best option but one has to start somewhere. With this I can simply pre-tag my chunks when I author them and then if I want to disable these chunks from processing, I simply place “eval=FALSE” in the appropriate tag over-ride options at the top of the file. If I decide that I need to re-process the data, I can change it back.
This would introduce a number of cool things that the RStudio IDE could do with the chunk tags. Examples would be color highlighting for each code chunk (chunks with multiple tags could show as a rainbow of applicable colors), execute all chunks with a specific tag, expand all chunks with a specific tag, collapse all chunks with a specific tag. The IDE could also include a new UI component which allows for easy inclusion and exclusion of a particular code chunk with a tag and also a tag explorer where the colorers could be adjusted.
I think this could be really cool and I would not mind trying to get it coded up but I am not super familiar with the RMarkdown / knitr / pandoc relationships (it has been a black box so far). Concerning the RStudio, I thought there might have been a push to code the IDE in QT and get away from the web browser approach. I would hate to waste my time working on something for the current architecture when there could be a change around the corner.
What do people think?