I don’t use setwd or rm(list=ls(), instead prefering to make sure RStudio never saves my workspace on exit, habitually restarting R, and using ProjectTemplate for caching, loading and munging. I like the workflow that ProjectTemplate sets up for you. Here’s how I typically use it.
Starting clean, I throw all my data into subfolders within the data folder. I set the recursive data loading to false in the global.dcf confg file, and create .R scripts within data to write code to load the data files. I do this rather than rely on automatic data loading by file type because files are often never as clean as I need them to be. Using scripts, I can use data.table::fread or readxl easily, supply column types up front, filter out unneeded columns, and work through whatever other ugly data steps are required.
Once I have the data loaded, I’ll restart R, load the project, and use the caching function to have ProjectTemplate automatically cache all the files it loads. There are some quirks, such as having to use dots instead of underlines in the object names, but that’s tolerable if only slightly annoying.
From there, I set the data loading config to FALSE, so I don’t accidentally start loading data anymore, and then work on munging. There is where I start to add more cache(’…’) commands, usually after any major munging step. I might use rm as well to remove the original datasets from the global environment if they’re no longer needed. I tend not to overwrite existing objects (I find I regret that every time I do).
Once munging is complete, I’ll set munging to FALSE in the global config yet again. Restart R, and start to do analysis. Every new analysis will use load.project() to load cached entries, though I can override that by setting cache loading to FALSE, and loading only particular objects if needed. Ideally, all datasets are in such a state that analyzing them becomes trivial compared to the real hard work, importing and munging.