Prepping data before using it in RMarkdown report?

Hi everyone,

(I think) I have a pretty solid use case for using RMarkdown: I am analysing results of different simulation runs and basically have to repeatedly run the same codes to create the plots. I've started to create reports and like what RMarkdown can do but I am unsure on the best practice of data preparation. The simulation data I use is stored in different files that I first have to mix and match to get the plots and analyses I am looking for. Is it generally inadvisable to "do it all" in markdown?
Prepping the data and saving it in a new file that I would then use for the RMarkdown report probably makes sense regarding performance but for part of my workflow, I would again find myself re-running code.

A hint is greatly appreciated :slight_smile:

Thanks,
Anna

1 Like

I think there is no a definitive answer for this, it all depends on personal preferences and specific priorities but I am of the opinion that a separate data-pre-prossesing script that produces "analysis ready" intermediate data sets from raw data, makes for an efficient and ordered manner to organize your work while documenting it.

1 Like

The main situation to use an "all RMarkdown" solution is if there will be changes to the data or analysis that would alter the report. If it's a one-off, especially if it's already completed (say a plot png file), it can be less error prone to just use includgraphics (or it's knitr version) to pop in the plot from a file. If you're just creating the workflow, the code can go in RMarkdown or an '.R' file that you source in the setup code block and include the plot as a side effect. That way you can use it both inside and independent of the RMarkdown file. Practice will tell you which is better in that case.

1 Like

Thank you for your replies! I personally feel that a pre-processing script is cleaner but as the data changes frequently, I think I'll have to try a few different ways to get the final result.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.