First line of every R script?


#1

I’d gotten into the habit of starting every script with:

rm( list = ls()); gc();

Even before I load any library()'s.

Was wondering if it makes sense to people.

X.


What's your favorite intro to R?
Project-oriented workflow; setwd(), rm(list = ls()) and computer fires
Working Offline
Shiny session-global variable
#2

I also often stick to the habbit at least, when I am iterating through a script many times. Just two notes:

  • it is safer to use
    rm(list = ls(all = TRUE))
    
  • Do you think it makes a difference to call gc()?

#3

gc() has been a more recent habit.

A little bit superstitious, but it can’t hurt, can it? :slight_smile:


#4

I used to do something like that, but have changed my habit to instead restart the R session prior to running the script.

  • This ensures that I’m starting from a clean environment (including which libraries are loaded) and supports future reproducibility.

  • Also, if I were to share my script with someone else, I won’t clobber their environment when they try it out.


#5

As @jchou I tend to restart R.

But I think its still worthy to adopt the idea, because (1) it does not hurt and (b) it is also nice for sharing scripts (other might not restart the session all the time).

I also makes me wonder with which comments do you start? I always try to provide a 1-liner with a broad explenations and then in a second line the date in which i wrote or edited the script.


#6

It does, as long as you’re not sharing that script! Bad behavior to start by clobbering someone’s workspace! @jennybryan had a good post/tweet/something about this, I think.

If you’re using RStudio, there’s a cleaner solution. Go to Tools | Global Options | General and untick one box and change a dropdown:
image
This way you get the same behavior as rm(list=ls()) but your scripts will play well when shared with others.
I didn’t figure this out on my own but I can’t remember where I heard it suggested as a good practice. Probably on twitter.


#7

So you restart Rstudio all the time?


#8

Yes! Everytime I switch to a different/new project.


#9

I use @atiretoo options and always work with Rmd and Knit documents.

Using knitr, objects referenced in the global environment cannot perturbate the execution of the script. And, when the document is knit, I get previous objects. So, I test scripts in console and consolidate my programs in an Rmd file.
There’s another advantage to always use RMarkdown: it forces me to literate my scripts and results are clean.

That’s why I became an absolute fan of knitr and RMarkdown!


#10

Doesn’t make sense to me for reasons pointed out above. It’s more reliable and flexible to just restart R any time you want to clean the slate. From RStudio, that’s easy enough to do with a keyboard shortcut. You don’t have to restart the IDE itself.

Also, rm() won’t clear out previously loaded libraries. I would add some detach/unloads of non-base packages as well.


#11

If I see a line like that in any script I plan on running (or Stack Overflow/GitHub reprex), I delete it before I run anything. Since R is used interactively, there’s a good chance anyone who runs a script has some objects in their global environment. Occasionally those objects are the result of scraping or long-running code that I really don’t want to rerun, even if I’ve been diligent about keeping the code necessary to produce them. To clear them out because I ran someone else’s script would occasionally result in a lot of swearing (though really I’d blame myself for not reading what I’m running).

I like this better, but while restarting R (and/or RStudio) will clear your loaded namespaces, depending on your RStudio settings it won’t necessarily clear your global environment (nor would I usually want it to; I crash R too often). There is a little broom icon in the Environment pane that will clear it interactively, though.

More broadly, the habit stems from a fear of name clashes, and there are better ways to avoid them:

  1. Make sure every variable referenced is created earlier in the script, not interactively or in another script. All packages should be loaded as well; if you can’t copy everything and successfully replicate your result with reprex::reprex(), you’ve forgotten something.*
  2. Name objects well. Don’t call data.frames df and vectors x or i, or if you do, expect that you’ll forget and write over it. Same for variables in data.frames: Call things what they are, not overly-general names that are likely to clash. This also applies to function (including anonymous ones) parameters. function(x){...} is somwhat traditional in lapply and such, but it’s way more likely to cause problems than function(sell_price){...}.
  3. Don’t make more objects than you have to. Don’t use attach or store vectors that you’re going to immediately put in a data.frame anyway. Don’t make lots of little subset data.frames which are more iterable as a list of data.frames or a list column of data.frames.
  4. Use RStudio projects. If you’re writing code for a single purpose, you’re less likely to run into overlaps with conflicting names.

That may not be enough to avoid every name clash, but they’re all good habits that will make your collaborators like you more anyway.

* There are exceptions to this when building packages, but R CMD check will alert you to any problems anyway.


#12

I agree with @greg - it’s way better to do this outside of your script than inside of it. Apart from objects and packages there are many other global settings (e.g. options(), par(), environment variables) that won’t get cleanly reset. I also highly recommend never saving or loading your workspace:

That ensures you always get a clean slate when you restart RStudio, and forces you to record all important steps in code.


Why not cntrl+z for reversing data changes in dataframe?
Defaults of saving and restoring workspace
#13

I run my current analysis non-interactively on a remote server, so the first line is a shebang:

#!/apps/R/3.2.2/bin/Rscript --vanilla

(It uses a job system, so pointing to the environment doesn’t work.)


Choosing between this site and StackOverflow for posting a question
#14

Just curious, why isn’t this the default setting.
I also had to “fix” various new comer issues by making them start with a fresh workspace each time. :slight_smile:


#15

And within a project?


#16

If I encounter issues I restart the R session interactively to reset the environment and namespaces


#17

Make sure you wrap most logic in functions and you will not have a lot of clutter in your global environment.
And even better, it allows you to reuse them more easily. If you have a couple of functions that belong together, combine them in a package.
I actually started making analysis packages, where the analysis itself is a vignette, the data is contained within the package and then you can also nicely declare dependencies on other packages.
It makes it easy to share your analysis with others, because everything is within the package.
(Note this might not be a great idea if your data is huge :wink: )


#18

Are there any additional settings in RStudio that need to be tweaked to ensure that restarting R leads to a entirely clean slate? I was just working with a student and found that on his machine the Cmd-Shift-F10 restart preserved the objects in his global environment. I verified that the “Restore .RData into workspace at startup” option was unchecked and that the “Save workspace to .RData on exit” option was set to “Never,” so am flummoxed as to why restarting R didn’t result in an empty environment. Has anyone else encountered this? (Student had to run off to another class, so I can’t provide details on his RStudio version, but can say that he was running it on a Mac.)


#19

I like to turn scientific notation off too


#20

Yeah, just do what the screenshot from @hadley tells you to do in settings & Cmd+Shift+F10 must be the most frequently used combo. That’s pretty much it.