How to create a pipeline to run few .Rmd files in a specific sequence

Hi,

I have only used .Rmd files so far and whenever I work on a project I end up making few individual .Rmd files. Sometimes these files are using same data, but mostly each file uses a separate data. When I need to update the project, I run each .Rmd file separately and it can get tedious sometimes. But I am thinking there should be a way to create a pipeline that includes these .Rmd files in a sequential order. So, when I run the pipeline, all the .Rmd files included run automatically without going through each separately. For example, if we have xyz.Rmd and abc.Rmd and data associated with each is xyz.xlsx and abc.xlsx respectively. How do I go about creating a pipeline to run both abc.Rmd and xyz.Rmd in that order. I have never created a pipeline before. Can you please help me with this technique? Can you please give an example with code.

Thank you so much!

The way I’ve approached this is not technically a proper pipeline, but does avoid having to open 10 RMD files to run individually. If the RMD files are in the same location, and named 01_x, 02_y, ... 10z then you can use ‘file_list <- list.files(dir) to create a list of file names, then you can do walk(file_list, render) to run through all your Rmds.

As you are mentioning pipelines, I must point you toward some packages that are designed for pipeline

  • drake is widely use,
  • and has also a more recent rewrite called targets

You can also look at Makefile to organize your pipeline

One simple solution would also be to create an R script that render the Rmd with the step you want in the order you want. You would just exectute this script with calls to rmarkdown::render whenever you need to update.

Hope it helps

If you're looking to go the route of calling the RMarkdown files from an R script, then you might find these blog posts helpful as examples:

Rich Majerus on "Creating Multiple Reports with RMarkdown" (I know this isn't exactly what you want to do, but the principles involved are similar)

Andrew Brooks, "Render reports directly from R scripts"

1 Like

Thanks @eringrand ! and we make the directory in console directly or in another script? Sorry very new to this!

Thanks @cderv! I will look into these.

For the R script. Do I just do the following as long as all Rmd files and the new R script file are in the same location say folder called "ABC":

library(rmarkdown)

render(
  xyz.Rmd,
  abc.Rmd
)

Sorry, very new to these!

Thank you!

Thanks @kaijabean! These are definitely a good start for me I believe!

You can do something very wordy

library(rmarkdown)

render("xyz.Rmd")
render("abc.Rmd")

This would execute the documents one after the other, in the same R environment.

If you want more programmatic way, you can follow @eringrand suggestion

library(rmarkdown)
rmd_files <- c("xyz.Rmd", "abc.Rmd") 
# or if you want all the Rmd file in a directory 
# rmd_files <- list.files(pattern = "[.]Rmd")
outputs_files <- purrr::map(rmd_files, render)

purrr is a package for iterations

If you don't want the same R environment to be shared, you can set the envir= argument in render() to new.env() or you can launch the file in a background process for a new fresh R session, using callr for example, or xfun::Rscript. It is more advanced, and you need that in some specific cases.

Hope it helps !

Thank you @cderv! I am going to try each of these so I get familiar with each. Thank you so much. This has been really helpful!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.