Project-oriented workflow; setwd(), rm(list = ls()) and computer fires

At a high-level, I like the project organization advice given in Good enough practices in scientific computing (full disclosure: I am a co-author, but didn't write this bit):

As a rule of thumb, divide work into projects based on the overlap in data and code files. If 2 research efforts share no data or code, they will probably be easiest to manage independently. If they share more than half of their data and code, they are probably best managed together, while if you are building tools that are used in several projects, the common code should probably be in a project of its own.

You'll have to mentally adjust all of that for your case, where shared data is the "tool" that is used in several projects.

For your specific situation, and with R in mind, you could put shared data extracts into a data package so you can just use library() instead of copying and loading, e.g. delimited files. Many companies, such as Airbnb, have also written internal packages to make it easier to use such internal data sources consistently. If you had that, each of your individual projects could contain the logic to do its own data extraction.

3 Likes