So you restart Rstudio all the time?
Yes! Everytime I switch to a different/new project.
I use @atiretoo options and always work with Rmd
and Knit
documents.
Using knitr
, objects referenced in the global environment cannot perturbate the execution of the script. And, when the document is knit, I get previous objects. So, I test scripts in console and consolidate my programs in an Rmd
file.
There's another advantage to always use RMarkdown
: it forces me to literate my scripts and results are clean.
That's why I became an absolute fan of knitr
and RMarkdown
!
Doesn't make sense to me for reasons pointed out above. It's more reliable and flexible to just restart R any time you want to clean the slate. From RStudio, that's easy enough to do with a keyboard shortcut. You don't have to restart the IDE itself.
Also, rm()
won't clear out previously loaded libraries. I would add some detach/unloads of non-base packages as well.
If I see a line like that in any script I plan on running (or Stack Overflow/GitHub reprex), I delete it before I run anything. Since R is used interactively, there's a good chance anyone who runs a script has some objects in their global environment. Occasionally those objects are the result of scraping or long-running code that I really don't want to rerun, even if I've been diligent about keeping the code necessary to produce them. To clear them out because I ran someone else's script would occasionally result in a lot of swearing (though really I'd blame myself for not reading what I'm running).
I like this better, but while restarting R (and/or RStudio) will clear your loaded namespaces, depending on your RStudio settings it won't necessarily clear your global environment (nor would I usually want it to; I crash R too often). There is a little broom icon in the Environment pane that will clear it interactively, though.
More broadly, the habit stems from a fear of name clashes, and there are better ways to avoid them:
- Make sure every variable referenced is created earlier in the script, not interactively or in another script. All packages should be loaded as well; if you can't copy everything and successfully replicate your result with
reprex::reprex()
, you've forgotten something.* - Name objects well. Don't call data.frames
df
and vectorsx
ori
, or if you do, expect that you'll forget and write over it. Same for variables in data.frames: Call things what they are, not overly-general names that are likely to clash. This also applies to function (including anonymous ones) parameters.function(x){...}
is somwhat traditional inlapply
and such, but it's way more likely to cause problems thanfunction(sell_price){...}
. - Don't make more objects than you have to. Don't use
attach
or store vectors that you're going to immediately put in a data.frame anyway. Don't make lots of little subset data.frames which are more iterable as a list of data.frames or a list column of data.frames. - Use RStudio projects. If you're writing code for a single purpose, you're less likely to run into overlaps with conflicting names.
That may not be enough to avoid every name clash, but they're all good habits that will make your collaborators like you more anyway.
* There are exceptions to this when building packages, but R CMD check
will alert you to any problems anyway.
I agree with @greg - it's way better to do this outside of your script than inside of it. Apart from objects and packages there are many other global settings (e.g. options()
, par()
, environment variables) that won't get cleanly reset. I also highly recommend never saving or loading your workspace:
That ensures you always get a clean slate when you restart RStudio, and forces you to record all important steps in code.
I run my current analysis non-interactively on a remote server, so the first line is a shebang:
#!/apps/R/3.2.2/bin/Rscript --vanilla
(It uses a job system, so pointing to the environment doesn't work.)
Just curious, why isn't this the default setting.
I also had to "fix" various new comer issues by making them start with a fresh workspace each time.
And within a project?
If I encounter issues I restart the R session interactively to reset the environment and namespaces
Make sure you wrap most logic in functions and you will not have a lot of clutter in your global environment.
And even better, it allows you to reuse them more easily. If you have a couple of functions that belong together, combine them in a package.
I actually started making analysis packages, where the analysis itself is a vignette, the data is contained within the package and then you can also nicely declare dependencies on other packages.
It makes it easy to share your analysis with others, because everything is within the package.
(Note this might not be a great idea if your data is huge )
Are there any additional settings in RStudio that need to be tweaked to ensure that restarting R leads to a entirely clean slate? I was just working with a student and found that on his machine the Cmd-Shift-F10 restart preserved the objects in his global environment. I verified that the "Restore .RData into workspace at startup" option was unchecked and that the "Save workspace to .RData on exit" option was set to "Never," so am flummoxed as to why restarting R didn't result in an empty environment. Has anyone else encountered this? (Student had to run off to another class, so I can't provide details on his RStudio version, but can say that he was running it on a Mac.)
I like to turn scientific notation off too
Yeah, just do what the screenshot from @hadley tells you to do in settings & Cmd+Shift+F10 must be the most frequently used combo. That's pretty much it.
Is it possible you're running up against Project Options vs Global Options in RStudio? I have puzzled myself in this way with respect to other settings.
I dont use rm() as first line, and am slightly annoyed when I see others use it. It feels clutter-ish.
For most analysis projects, my first line is library(tidyverse)
This is also one of the great reasons to get familiar with packaging your analysis in an internal R package. My favourite way to deploy an analysis is to write an R package which the server then installs and runs, this helps with a few different things but one of them is that it helps you make sure that everything needed to run an analysis is present in the code. It ends up being a lot easier than it seems to get started with this stuff, and it really smooths the analysis -> sharing -> deployment curve.
I've learned a ton reading through these responses - thank you to everyone for sharing!
In all honesty, my first line is almost always a commented header with the project/script objective and the due date.
You know in rstudio you can have multiple projects up at the same time, right? Each w/ it's own global.
I never noticed that was in the desktop version before, or I might not have switched to primarily using the server.