Advice on structuring R projects

I love using R, but comparing it to the Python workflow, I have trouble getting something that feels as "stable".
My situation is:

  • I have a git repository that hosts an R project and and renv environment.
  • In this repository, I have multiple analyses that are intertwined to a certain extent.
  • I would like to test most of the functionality and only call tested functions in a main analysis script.

I'm aware that the usual approach is an R package. With this, however, you can't have granular folder structure, since no subfolders are allowed in R/.
Splitting up the repository, I would end up with a myriad of small packages, dedicated R projects and renv environments, which doesn't seem correct given the partial interdependence, like helper functions I want to be accessible from everywhere.

Also, approaches like the drake or target packages are awesome for reproducibility, but seem to be geared towards single pipelines/analyses per R project.

Just sourceing everything everywhere (as is shown in example workflows for these packages) is dangerous, given that I want small functions to break down functionality and enhance reusability, with more of these functions the likelihood of functions overwriting each other unnoticed increases. I could use a common prefix for functions in a file, but this might result in overly long names and less expressive code.

Testing with testthat is mostly intended for packages. I can use test_dir, but that needs to be clumsily launched from a shell script or something, no support in an IDE anymore to run single tests.

The import package solves alot of these issues. It gives you a file as a workspace, where you can define as many functions or constants as you want, and if you import a function from that file, the functions/imported functions within that file are accessible to the function, but not available from the outside. This gives very neat encapsulation, but this doesn't work for the drake or target packages which can't detect the nested dependencies of imported functions, which breaks their purpose.

So my question is: What do you recommend as an approach for setting up something slightly bigger? All guides out there on R are for very simple analysis setups only and don't really answer my question, they rarely deal with testability or predictable behaviour of your analysis.

Thanks alot in advance!

Michael

3 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.