As per title, suppose I have a long
.Rmd file and I want to knit only the part from the start down to a certain line. Is there a way? I tried commenting the remaining lines and it usually works but not always (I guess the mixture of text and R chunks inside the comments is not always skipped by the Knit command).
As per title, suppose I have a long
Before we start diving into options, maybe you could give a bit more on what is motivating the question?
- Is the knit just taking too long to run and you're trying to save time?
- Do you want two versions of your doc, a short and a long, or perhaps chapters?
- Something else?
I have multiple reasons for different use cases, but let's start from the most common ones:
- I often have to repeat similar (even if never identical) analyses for different data sets. Thus I recycle existing R Markdown files. If I knit all the file, I'll be sure to trip up some errors, either because some column had a different name, or maybe because some differences in data made one of my statistical models to throw an error, etc.. So I proceed gradually, starting from the top and modifying what needs to be modified. Most of the debugging happens directly in the editor, using the super-useful "Run all chunks above" command. However, every now and then I'd like to knit the part of the report I completed: R Markdown HTML reports are a joy to see, and something aesthetically pleasing makes my job nicer
- in some other cases, the knit is just taking too long to run, as you say. For example, I'm working on a file with ~ 50 columns and ~ 100000 rows, and I use the excellent
visdatpackage) to communicate visually what kind of data I'm dealing with, and to show that there are a lot of missing data, often not at random. These are great visualizations, but they take too much time to render each time I knit. I can of course comment the corresponding chunks, but it would also be cool to be able to knit only up to a certain line of a document.
This might be a good use case for R Notebooks:
How about preprocessing the .Rmd file with something like:
lines <- readLines("Test1.Rmd") writeLines(purrr::map_chr(1:15, ~ lines[.]), "test1s.Rmd")
and then knit the result?
Depending on your usecase of "...repeat similar (even if never identical) analyses for different data sets..." this also might be a good use of https://rmarkdown.rstudio.com/developer_parameterized_reports.html
Note the input dataset section under Parameter User Interfaces
I like this a lot! Let me try.
Thanks for the suggestion: my report is indeed parametric, even if not to such an advanced level as shown in your link (which I will definitely study in the next days).
Having said that, I don't think that's a viable solution for me. There are consistent differences among the data sets:
- the column names are different
- the "structure" of the "missingness" is different (sometimes very few data are missing at all: some other times entire columns are missing, indicative of a failed sensor)
- even the physical sense of the variables can be different.
- In some cases I need to look at all the variables, in other I don't.
Thus, some level of manual editing of the report is in my opinion unavoidable. It's true than in most cases I need to perform an EDA and a survival analysis, but I don't think I can easily parametrize that: depending on the specific data set, I may be content with a Weibull model, or I may need a Cox proportional hazard models, or something even more complicated.
I should at the very least invest a lot of time (which I don't have right now) in studying
tidyeval, and in writing scripts which are very flexible in terms of number of variables involved, column names, preprocessing and modeling steps to apply...I don't believe in "automatic Data Science": I think some manual intervention is needed. Or maybe it could be possible, but that would require building a Data Science platform: it's not something I can do on my own with an R Markdown report.
But this is just my personal opinion, and I'm sure that for more standardized tasks (like for example performing always the same kind of analysis on similar dataset which are collected weekly) people can be far more productive using parametrized reports.
by the way, it seems to work (I need to do some more tests) but I'm not sure what it does, exactly. Can you explain? Also, is the
writeLines really necessary, or could I knit from a character vector, instead than from a file?
This is another interesting option, I thought R notebooks were not really different from R HTML reports, but I may well be wrong! I'll need to study this option too.
You evaluate them piece by piece. If you're knitting to PDF, the whole document has to be created. That said, there are caching options, which you can explore in knitr chunk options:
Aside: I'm going to move this to #R-Markdown, since it's not really IDE-related.
I'm not that familiar with the details of knitr, but as far as I can tell the
knit function takes a file path as input, not a connection as some functions do. Maybe there is a way to turn an in-memory string into a file, but I don't know.
However because of the way files tend to be handled internally you might find out that writing to a file then reading it is about as fast as just reading the in memory string (if there was a way to do that) so I suggest giving it a try and seeing if it meets your performance requirements.
BTW some languages (when the OS supports it) have the concept of a temporary file, that is a file that is going to be thrown away when the app ends. These kinds of file are handled differently that regular files and often live just in memory. Unfortunately R doesn't seem to support this concept... all there is is a function to create a temporary file name.
Expanding on what @mara said with chunk options, passing in eval=FALSE, echo=FALSE in each chunk you don't want to be inlcuded in the knitted document will do the trick.
I will test this too! Thanks
The knitr option discussed in this SO thread may be useful in your situation:
@DavoWW this is perfect! Let me recap the solution here, so that people don't have to read through the SO thread. It's very easy to stop knitting a document at any line of your
.Rmd document: just add the line
anywhere in the source document (I put it in an inline expression because it's more compact, but it would still work if you put it in a code chunk). Fantastic!
I run into this issue a lot. Often my analysis take some time to run. When I've finished the analysis and made the plots, the knitting part often take a lot of unnecessary time, because it needs to re-run everything. What makes it worse is that after 30 min of knitting, some error comes up. Although there are cache options, it's just not convenient to go through chunk by chunk to decide what to cache.
What I would like to have is to create html from notebook without re-running anything. The .nb.html files sometimes does that, but there are cases when it stops updating due to some knitr errors.
This is why I like Jupyter better since the .ipynb file can just be converted to html without re-running stuff.