From RStudio, is it possible to knit only part of an R Markdown document?


#1

As per title, suppose I have a long .Rmd file and I want to knit only the part from the start down to a certain line. Is there a way? I tried commenting the remaining lines and it usually works but not always (I guess the mixture of text and R chunks inside the comments is not always skipped by the Knit command).


#2

Before we start diving into options, maybe you could give a bit more on what is motivating the question?

  • Is the knit just taking too long to run and you're trying to save time?
  • Do you want two versions of your doc, a short and a long, or perhaps chapters?
  • Something else?

#3

I have multiple reasons for different use cases, but let's start from the most common ones:

  • I often have to repeat similar (even if never identical) analyses for different data sets. Thus I recycle existing R Markdown files. If I knit all the file, I'll be sure to trip up some errors, either because some column had a different name, or maybe because some differences in data made one of my statistical models to throw an error, etc.. So I proceed gradually, starting from the top and modifying what needs to be modified. Most of the debugging happens directly in the editor, using the super-useful "Run all chunks above" command. However, every now and then I'd like to knit the part of the report I completed: R Markdown HTML reports are a joy to see, and something aesthetically pleasing makes my job nicer :slight_smile:
  • in some other cases, the knit is just taking too long to run, as you say. For example, I'm working on a file with ~ 50 columns and ~ 100000 rows, and I use the excellent vis_dat and vis_miss (from the visdat package) to communicate visually what kind of data I'm dealing with, and to show that there are a lot of missing data, often not at random. These are great visualizations, but they take too much time to render each time I knit. I can of course comment the corresponding chunks, but it would also be cool to be able to knit only up to a certain line of a document.

#4

This might be a good use case for R Notebooks:
https://rmarkdown.rstudio.com/r_notebooks.html


#5

How about preprocessing the .Rmd file with something like:

lines <- readLines("Test1.Rmd")

writeLines(purrr::map_chr(1:15, ~ lines[.]), "test1s.Rmd")

and then knit the result?


#6

Depending on your usecase of "...repeat similar (even if never identical) analyses for different data sets..." this also might be a good use of https://rmarkdown.rstudio.com/developer_parameterized_reports.html

Note the input dataset section under Parameter User Interfaces


#7

I like this a lot! Let me try.


#8

Thanks for the suggestion: my report is indeed parametric, even if not to such an advanced level as shown in your link (which I will definitely study in the next days).

Having said that, I don't think that's a viable solution for me. There are consistent differences among the data sets:

  • the column names are different
  • the "structure" of the "missingness" is different (sometimes very few data are missing at all: some other times entire columns are missing, indicative of a failed sensor)
  • even the physical sense of the variables can be different.
  • In some cases I need to look at all the variables, in other I don't.

Thus, some level of manual editing of the report is in my opinion unavoidable. It's true than in most cases I need to perform an EDA and a survival analysis, but I don't think I can easily parametrize that: depending on the specific data set, I may be content with a Weibull model, or I may need a Cox proportional hazard models, or something even more complicated.

I should at the very least invest a lot of time (which I don't have right now) in studying tidyeval, and in writing scripts which are very flexible in terms of number of variables involved, column names, preprocessing and modeling steps to apply...I don't believe in "automatic Data Science": I think some manual intervention is needed. Or maybe it could be possible, but that would require building a Data Science platform: it's not something I can do on my own with an R Markdown report.

But this is just my personal opinion, and I'm sure that for more standardized tasks (like for example performing always the same kind of analysis on similar dataset which are collected weekly) people can be far more productive using parametrized reports.


#9

by the way, it seems to work (I need to do some more tests) but I'm not sure what it does, exactly. Can you explain? Also, is the writeLines really necessary, or could I knit from a character vector, instead than from a file?


#10

This is another interesting option, I thought R notebooks were not really different from R HTML reports, but I may well be wrong! I'll need to study this option too.


#11

You evaluate them piece by piece. If you're knitting to PDF, the whole document has to be created. That said, there are caching options, which you can explore in knitr chunk options:
https://yihui.name/knitr/options/#chunk_options

Aside: I'm going to move this to #R-Markdown, since it's not really IDE-related.


#12

I'm not that familiar with the details of knitr, but as far as I can tell the knit function takes a file path as input, not a connection as some functions do. Maybe there is a way to turn an in-memory string into a file, but I don't know.

However because of the way files tend to be handled internally you might find out that writing to a file then reading it is about as fast as just reading the in memory string (if there was a way to do that) so I suggest giving it a try and seeing if it meets your performance requirements.

BTW some languages (when the OS supports it) have the concept of a temporary file, that is a file that is going to be thrown away when the app ends. These kinds of file are handled differently that regular files and often live just in memory. Unfortunately R doesn't seem to support this concept... all there is is a function to create a temporary file name.


#13

@danr yep, Python supports that through the tempfile module!


#14

Expanding on what @mara said with chunk options, passing in eval=FALSE, echo=FALSE in each chunk you don't want to be inlcuded in the knitted document will do the trick.


#15

I will test this too! Thanks


#16

The knitr option discussed in this SO thread may be useful in your situation:
https://stackoverflow.com/questions/33705662/how-to-request-an-early-exit-when-knitting-an-rmd-document


#17

@DavoWW this is perfect! Let me recap the solution here, so that people don't have to read through the SO thread. It's very easy to stop knitting a document at any line of your .Rmd document: just add the line

`r knitr::knit_exit()`

anywhere in the source document (I put it in an inline expression because it's more compact, but it would still work if you put it in a code chunk). Fantastic!


#18

@danr Actually, R does provide a "disposable temporary file" facility. Use the with_file function of the withr package.


#19

@pteetor this is brilliant! I didn't know about that. I have to tweet this gem out :grin:


#20

I run into this issue a lot. Often my analysis take some time to run. When I've finished the analysis and made the plots, the knitting part often take a lot of unnecessary time, because it needs to re-run everything. What makes it worse is that after 30 min of knitting, some error comes up. Although there are cache options, it's just not convenient to go through chunk by chunk to decide what to cache.

What I would like to have is to create html from notebook without re-running anything. The .nb.html files sometimes does that, but there are cases when it stops updating due to some knitr errors.

This is why I like Jupyter better since the .ipynb file can just be converted to html without re-running stuff.