First line of every R script?

If I see a line like that in any script I plan on running (or Stack Overflow/GitHub reprex), I delete it before I run anything. Since R is used interactively, there's a good chance anyone who runs a script has some objects in their global environment. Occasionally those objects are the result of scraping or long-running code that I really don't want to rerun, even if I've been diligent about keeping the code necessary to produce them. To clear them out because I ran someone else's script would occasionally result in a lot of swearing (though really I'd blame myself for not reading what I'm running).

I like this better, but while restarting R (and/or RStudio) will clear your loaded namespaces, depending on your RStudio settings it won't necessarily clear your global environment (nor would I usually want it to; I crash R too often). There is a little broom icon in the Environment pane that will clear it interactively, though.

More broadly, the habit stems from a fear of name clashes, and there are better ways to avoid them:

  1. Make sure every variable referenced is created earlier in the script, not interactively or in another script. All packages should be loaded as well; if you can't copy everything and successfully replicate your result with reprex::reprex(), you've forgotten something.*
  2. Name objects well. Don't call data.frames df and vectors x or i, or if you do, expect that you'll forget and write over it. Same for variables in data.frames: Call things what they are, not overly-general names that are likely to clash. This also applies to function (including anonymous ones) parameters. function(x){...} is somwhat traditional in lapply and such, but it's way more likely to cause problems than function(sell_price){...}.
  3. Don't make more objects than you have to. Don't use attach or store vectors that you're going to immediately put in a data.frame anyway. Don't make lots of little subset data.frames which are more iterable as a list of data.frames or a list column of data.frames.
  4. Use RStudio projects. If you're writing code for a single purpose, you're less likely to run into overlaps with conflicting names.

That may not be enough to avoid every name clash, but they're all good habits that will make your collaborators like you more anyway.

* There are exceptions to this when building packages, but R CMD check will alert you to any problems anyway.

2 Likes