Assume we are working on a script, data_cleaning.R
, that ultimately creates a large .rds
object. One idiom I have used (and this could be the wrong approach) to avoid re-creating that object each time we source the script is something like this:
data_cleaning.R
PATH_FOR_DATA <- "objects/big_df.rds"
stopifnot(!exists(PATH_FOR_DATA))
# Do a bunch of stuff...
write_rds(df, PATH_FOR_DATA)
This works great. But let's say we want to version control this script for collaboration. It works fine the first time -- anyone "new" will pull down the script and since they've never run it, they will create their own copy of the big_df.rds
object locally.
But what happens when someone changes the data_cleaning.R
script? How do you ensure collaborators always have the latest-and-greatest copy? Maybe something with a build number? Could still be problematic?