@mfherman Since you said you were a newer R user, have you looked into the book R for Data Science? It's a great resource for getting started into R and really focuses on the tidy model (it is written by Hadley Wickham after all) and the last section of the book is all about communicating results and has chapters on RMarkdown, everything you can do with it, and how to incorporate analysis into it seamlessly.
I like it and I'm working more towards this, but at the same time I feel like in doing so I am rejecting the original design and purpose of R Notebooks (at least as described in R4DS). However, as all my physical lab notebooks have also been failures, it is not surprising I can't maintain a digital one.
From a private sector corporate perspective, I've found RMarkdown (specifically knit to HTML) to be an incredibly powerful communication tool for analysis delivered to managers, stakeholders and CxO positions. The Bootstrap framework (for HTML specifically) allows the report to be opened via email, even on a mobile device (with responsive design on mobile). This is something very valuable to a CxO on the go who works primarily on their phones. It also allows for a low barrier to entry sharing of the reports amongst departments or other analysts (in contrast to Tableau, Power BI, Power Point). And finally, given the HTML markdown can be opened right in your desktop browser, it allows you to keep the report in a very convenient place (a tab in your browser) that cuts down on 'Alt+Tab' or having to open another application to render.
I think the convenience of the html markdown file format is something not praised as much. I've found it to be the most powerful persuasive detail that has allowed me to continue to use RMarkdown for my work.
Something I find important that hasn't come up yet: I like to render R Markdown (and specially-crafted R scripts) so I can revisit an analysis later w/o actually redoing the analysis.
Example: the gapminder data package was created from 3 messy Excel spreadsheets from the Gapminder website. Of course I saved the R scripts, but I also saved rendered versions, so I see what that process looked like the last time I did it (in 2015, apparently). Click on any
.md file here:
You can learn about my data cleaning there without having to download the spreadsheets yourself, install the packages I chose to use, and run all my scripts.
In fact, that README itself was constructed as an
.Rmd + a lot of file name discipline! 2017 Jenny would do lots of things differently from ≤2015 Jenny , but let's just ignore that.
I think the concept of
rmarkdown::render() is very powerful for a data analyst. It works for
@dlsweet I’ve worked through nearly all of r4ds and recommend it to anyone who asks me how I learned R! The project organization aspect of R Markdown is what has been giving me the most trouble, so all of these answers (especially @apreshill’s!) have been very helpful. Looking forward to hearing about other R Markdown use cases and ways to organize scripts, etc.
Lots of good stuff so far, but I feel like it's a bit focused on generating reports and analysis where Rmarkdown is really much more than just that.
Rmd files let you mix code (not just R, but other code engines as well) and markdown together to form publication ready documents.
In more layman terms, Rmarkdown can help you:
- write reports for work
- publish scientific journal articles
- write a book
- make a blog
- create documentation for your R package
- build an interactive dashboard
- document your analysis like a science lab notebook
- build a wiki
- create templates for homework assignments
- create templates for technical interviews
All of these options are possible just by adding a little bit of configuration options at the top of the Rmd file (such as title, author, theme, output file format, etc.), using markdown syntax to format your text (such as bold, italics, bullet points, etc.), and inserting "code chunks" to run arbitrary bits of code (such as make a plot using ggplot2 in R, run a SQL query against a remote database just by referring to the connection, perform some text manipulation in Python, etc.). The Rmd file is just a way to section off arbitrary bits of code from different other formats/languages, and the tool
pandoc and R packages
knitr parse the Rmd file and build it into the document you want (defined in the config section at the top).
Hopefully you can see how useful Rmarkdown can be. If all you are doing is transforming bits of information and storing the results somewhere else, you might not need Rmarkdown. But if you have a story to tell with the results and want a flexible tool to help you tell that story in the way you see fit for the situation, Rmarkdown is going to be a great asset.
Building a wiki with RMarkdown
I use markdown to document and walk colleagues through the process I've followed to get to the analysis outputs / data products I share with them, as well as problems I've hit that need discussing. And I use different documents during the development process.
I actually start developing code in a rmarkdown notebook. They're really cool cause you can run each chunk of code and the output renders below it! So it's really good for sanity checks and having an overview of the analysis visible as you develop it. You can even combine chunks in different languages! So if you needed to access data from a database, you could write an SQL chunk to extract it. Once I think I've got the analysis I want, I decide whether and what code to strip into R scripts or function scripts that can be sourced (or run on a cluster if necessary), echoing @apreshill approach. But, when I do, I use the chunk naming notation:
## ---- chunk-name ----
in my scripts. This allows me to use
knirt::read_chunk() function in my Rmd, to read in the code from my scripts and call the chunks in the original Rmarkdown notebook. This way I only have one copy of the code (so if it changes, it will automatically change in the rmarkdown document when re-rendered) but can still include it in documentation which I now consider an indispensible part of the workflow. I find being able to show code, inputs, outputs and notes as well as links to literature or other sources of info that contributed to the development of the code the best way to show and tell what I did (to my future self as well as others). They're also a great way to document metadata.
Finally, echoing @foundinblank, I worked for a couple of years remotely from my collaborators, skyping to discuss progress and decide next steps. The best I found to manage this was to record the progress, ideas and any problems I'd hit (either with the analysis or often even in the data itself) in and rmarkdown document so we had something to go through in our meetings. I also made use of the interactive html features rmarkdown offers, like searchable tables of (reasonably sized) data using functions in the
DT package (the default printing of dfs and tibbles is now pretty good in notebooks) or making plots interactive using
plotly. That way collaborators could troubleshoot aspects of the data or zoom into to specific parts of plots without asking me to replot stuff or provide separate data files. All the information they needed to think through the problem were there in the report!
Finally, once you get the hang of markdown, it opens the door to start making websites, blogs and even presentations...all through R!
I share @Ranae's concern when trying to work out how to switch to using RMarkdown for my scientific work. Trying to work out how to use them when I might need to run the same functions over a thousand different inputs is tricky—do I set up the script as a function that can be called from bash, and generate a report for each input, or whole, massive, iteration inside an Rmd chunk?
I've been wanting to try a makefile-and-Rmd-based workflow ever since @datandme tweeted about one, so thanks for posting that, @zkamvar!
With the caveat that I've only read about this topic, have you looked at the Knit with Parameters option for RMarkdown in RStudio? From my understanding it lets you produce a single report and then input different parameters, such as a data set, if the resulting report needs to be the same for multiple data sets.
This is the RStudio site explaining this type of report:
I've used the parameterized reports and they work quite well. The ezkintr vignette shows a good use case for this with multiple data sets in the same project.
One simple-but-powerful tool for working with long RMarkdown scripts in RStudio that I really like is code folding. You can not just fold functions, but also chunks or entire sections defined by Markdown headings, so a 500 line script can look like
or even shorter, if it's collapsed further. If you organize your headings well and keep the parts you're not working on collapsed, it's really easy to navigate an enormous script. At some point it may still make sense to break parts out, organizing them via a Makefile, vigorous use of the
child chunk option, or creative use of bookdown, but for most projects I find splitting is overkill for my purposes.
I use RMarkdown for all my scripts, not just reports because I can have better organization. If you ever need to run the script repeatedly and found RMarkdown awkward for that, you can always convert a RMarkdown into a script.
With RMarkdown you get
notes, reference, thoughts in markdown format outside code, much easier to read compare to comments in code.
I keep comments that need to stay with code in code, but found there are a lot of things I want to keep outside of code, especially my plan and findings. That could be extremely helpful if you need to pick up something several months later.
You can organize your code with functions, foldable comments (you can use # comment ---- to create foldable comments in script, and they will show in outline), but chunk is more flexible. You can run selected code chunks repetitively, much easier than selecting a section of code and evaluate it. Code chunks that no longer needed to be run but still good to keep can be marked with
eval=FALSEand it will not be included.
outline is great to organize long RMarkdown document.
To develop my shiny app, I create a RMarkdown for every major task, record notes and reference, experiment with ideas etc. When I have working code ready to be incorporated into the shiny app, I copy the code into app. For longer code sections, I create foldable comments around them, fold it so it's much easier to select that section and copy it.
Rmarkdown is the ultimate tool for reproducible research/reports.
Tip. Don't forget to save session info at the end.
I have to confess I never thought to load scripts into chunks. Perfect for me. Thanks!
At eelloo we have been using Rmarkdown for over two years now. All our research reports are designed and generated with it. Recently we introduced a data dashboard like report, that is completely made in Rmarkdown and generated as a pdf document.
Here are some examples gathered in one document:
Ofcourse it's up to you if it can help convincing you using Rmarkdown, but to me the possibilities it gives (with the help of some LaTex and ggplot2) seem bounderless.
I do a lot of reports that change based on minor tweaks to the code. It's nice to set things up so that I never have to rewrite the verbiage, just change the code (which I would have to do anyway). R Notebooks are also a godsend for viewing the results in-line, and therefore returning to the results later without consciously saving every png. I can't imagine my workflow without them at this point.
@Rdatasculptor, are you able to share some of the techniques used to generate those plots? (I'm not asking for the code).
Are the individual plots from ggplot2 with themes for the rounded borders and all brought together via a grid.arrange formation?
Although I don't deal with infographics I am always looking for ways to be able to pack more plots into a given area where appropriate.
@jessemaegan well actually there are so many other great posts here, that it would take a lot to summarize all of them
@martin.R, I am working on some example code, but I haven't had much time lately to complete it.
Actually there is quite some LaTex code involved. The rounded corners are made by using mdframed LaTeX package. This package offers rounded boxes in which the ggplot2 graphs can be embedded using Rmarkdown. I don't use grid.arrange, all the lay out features are defined in (sub) pages in LaTeX.