Knit Rmd to html, docx and pdf - what is the fastest way of doing this?

Hi

I have R notebook which takes a few minutes to compile. Now I would like to create an html, pdf and docx from a make file it. At the moment I am using

Rscript -e "rmarkdown::render('notebook.Rmd', output_format = 'all')"

but this executes all the R code three times, and this takes long. Would it be possible to create the .md as and than use pandoc commands to convert it? Or what would be the fastest way of obtaining the three targets from the .Rmd file?

Thanks,

Rainer

One solution could be to use the caching mechanism of knitr that Rmarkdown uses. With this, your chunk will be run only once.

See knitr options docs about cache and maybe the recent blog post by yihui about cache invalidation if you want to go deeper.

If you just want to use cache at render, I think you can be able to activate it only when render with all then delete the cache directory.

As I find it a very interesting question, I looked at rmarkdown::render documentation and see that there are several parameters that may be of help here if one want to explore.

  • clean=FALSE won't delete the intermediate files between rendering of the different format, but really I am not sure you can use them between formats and they may be erased by next rendering.

  • run_pandoc = FALSE will output, per the doc, "the path of the Markdown output file, with attributes knit_meta (the knitr meta data collected from code chunks) and intermediates (the intermediate files/directories generated by render() )."
    However, I am not sure how you can use that after to generate from these thanks to pandoc your ouput.

It seems that caching mechanims of R chunk can be the simplest way to go here.

3 Likes

The idea of knitting .Rmd once, keeping the intermediate .md, and converting it three times via Pandoc is certainly sounds much more efficient than compiling Rmd three times. Unfortunately, there is a fundamental problem. By default, rmarkdown::render() uses different figure and cache directories for different output formats. For example, the figure format for PDF outputs is pdf, and figures are generated to foo_files/figure-latex/; for HTML output, the figure directory is foo_files/figure-html.

That means for different output formats, the intermediate .md file will be different, unless you don't have figure output or forces all output formats to use the same figure format, in which case you could use clean = FALSE and run_pandoc = FALSE as @cderv mentioned, and then call rmarkdown::pandoc_convert() separately.

It really depends on how time-consuming your code chunks are. Personally I think the easiest solution is to turn on caching on these chunks, and cache them three times for the three output formats, respectively. The next time it should be fairly fast to regenerate the output files if you don't modify these code chunks.

4 Likes

Thanks a lot @cderv and @yihui for these answers - very informative and helpful.

I wasn't sure if the caching was independent of the targets - as it is, I am on the safe side, as I am using a cache already, and in the makefile (see https://gist.github.com/rkrug/6aab472682210381a1947411aef20dba for the makefile) I have a target clean cache which deletes the cache.

Thanks a lot

Rainer