Assume we are working on a script,
data_cleaning.R, that ultimately creates a large
.rds object. One idiom I have used (and this could be the wrong approach) to avoid re-creating that object each time we source the script is something like this:
PATH_FOR_DATA <- "objects/big_df.rds" stopifnot(!exists(PATH_FOR_DATA)) # Do a bunch of stuff... write_rds(df, PATH_FOR_DATA)
This works great. But let's say we want to version control this script for collaboration. It works fine the first time -- anyone "new" will pull down the script and since they've never run it, they will create their own copy of the
big_df.rds object locally.
But what happens when someone changes the
data_cleaning.R script? How do you ensure collaborators always have the latest-and-greatest copy? Maybe something with a build number? Could still be problematic?