For my work, I query the required data from a data base. When working remotely, I can not access this data base. Therefore I used to save the necessary data in a work space that I can access remotely. Now I read often that you never should save work spaces. Any ideas for improving my work flow?
Start with a script that “offlines” your data, then saves it to a
RDS file with
saveRDS. If you have any time-intensive standard processing of the data frame (that you won’t want to change while offline), you could include that in this script as well. Then, in your analysis/“working” script, start your script with something like:
puppy_data <- readRDS("puppy.rds") # Now analyze puppy_data
When you are local and want to refresh your data, re-run the offlining script so that everything is up-to-date, and your analysis script can then work from the updated local copy.
An alternative to nick’s suggestion of saving the workspace to a RDS file would be to save the database data to e.g. a csv file which may be a more flexible solution. The data would then not be tied into your R workspace.
Thanks for your replies
Having multiple tables or objects, I think that saveRDS isn’t very suitable, and I would stick to saving work spaces. But in case of only one object or data.frame I think that saveRDS would be a better option.
In that case, I would replace the
saveRDS command in my suggested pre-processing script with
save.image(file = "Processed_puppy.Rdata"), and then load it with
load("Processed_puppy.Rdata") in your analysis script. In that case, you would also want to clear your environment prior to running your pre-processing script, meaning that this is one situation that I would suggest that you should start your script with
rm(list = ls()).
The main point of all of this is that you want your work to be reproducible, even (and often especially) by yourself. If you rely on a constantly updated global environment to track your work, all it takes is one undocumented
dat$value <- dat$value - offset command sent to the console make it very difficult to reproduce your analysis 6 months from now. I speak purely from theory, of course, having never encountered such an absurd situation in my own code.