When I do analyses it’s common for me to run off a butt-load of plots. Even with faceting and grouping and all those fun tricks, I might have anywhere from dozens-to-hundreds of plots spat out of an analysis.
A typical problem for me is turning the crank on my analysis (which is usually on a remote server), writing out all of these exploratory plots to disk, then transferring the whole directory of plots to my own drive using
rsync to browse.
At some point, I then start picking out plots for display (eg. in PowerPoint), and at this point changing them for public consumption is a little frustrating. It seems wasteful to go back into my analysis code, inject a bunch of extra ggplot code to polish them up, and then run the whole thing again just because I wanted a couple of plots to have light-on-dark or a transparent background or a better colour scheme. On the other hand, I’d rather avoid modifying the PDFs or SVGs in Illustrator, because if the analysis changes, I have to redo it (and PDF output, while layered, isn’t layered semantically: this work is a pain I’d like to minimise).
How can I do this better? It strikes me that it might be a good idea to start saving my exploratory plot objects, so that if I ever need to go back and pretty one up, I can just retrieve the object and add new ggplot2 elements to change it. I’ve done this before with GLMs, exporting out a named list with metadata encoded into the element names and saving the structure to disk with
So this is starting to sound like a package idea. A basic version of this could write plots to a data frame list column instead of out to disk directly; a more complex version could write them to an external database. Either way, such a database could:
- Return plot objects based on metadata, so that you can write them out or view them (or modify them first);
- Return selections of plot objects, in case you need to make changes to a bunch of plots;
- Potentially do something akin to version control for plots, if that would be useful.
Am I overthinking/overengineering this? Does anyone else have this problem dealing with too many plots? Could I just eliminate this problem by having a better workflow in other ways?