Breaking up huge Rmd analyses

I'm doing a series of analyses on some single cell RNA-seq data, and ended up lumping like 10 different analyses in the same R Markdown file. Now the R Markdown file has grown to over 1300 lines and is still growing. Often, I have to sift through the entire big mess to find where I generated an object. I also realized that I often have many objects in the environment generated in a previous analysis taking up a lot of memory, but is no longer needed for the current analysis; those objects are there because I reran earlier code chunks. When I restarted the R session and cleared the environment, I often have to manually and selectively rerun code chunks until I get to the section I'm currently working on, because the current section doesn't need all the results from the previous sections, but doing so isn't reproducible and wastes time.

So I think it's time to break that one big R Markdown file into smaller pieces, like one file for the standard workflow including data normalization, dimension reduction, and clustering, save the resulting Seurat object as an rds, and read the rds into a new R Markdown file for differential expression between clusters and cell type annotation, and so on. The biggest hurdle right now is when I need an object generated early in the workflow in many different analyses; I suppose I should save it as an rds early and load it. Then I'll write an R script for rendering the reports in order.

Have you also been in this kind of situation? Any tips on breaking up a large R Markdown file into smaller and more modular pieces?

1 Like

To me it sounds like you can benefit from using something like drake (https://github.com/ropensci/drake).

It is designed specifically for situations where you have complicated relationships between various artifacts that need to be kept up-to-date.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.