In brief, current R conventions generally don't promote the use of a main function; instead, the global workspace is often treated as a main function. This is similar to other scripting languages but there is a common idiom in Python to use a main function and
if __name__ == "__main__":, which is even included in the standard documentation. One StackOverflow response identified an analogous idiom using
Although I recognize R has historically been used interactively and more as a statistics tool than a programming language (and traditional programming principles may not apply), R programming has grown more and more consistent with software-engineering best practices. I wonder whether R would be benefit from a similar convention. Of course, using a main function makes much more sense for Python since Python programs are organized into modules, which serve as both scripts and libraries, whereas in R, we separate the concepts of scripts and packages. However, many of the benefits would be the same, such as making the state and dependencies that affect each function more manageable.
Is the common use of global variables (not including functions) in R bad practice? Should the R community encourage use of a similar main-function idiom for larger-scale projects?
This is motivated by this Reddit post on /r/rstats. I was curious what thoughts the Rstudio community had on it.
In R scripts (as opposed to packages), both in interactive use as well as in reproducible scripts, it seems to be fairly conventional to use the global workspace as a sort of main function. Thus, R scripts often populate the global environment with a fair amount of global variables. Sometimes these are, in practice, just global "constants", but often they are not, in particular when variables are re-assigned back to the name, for example:
x <- c(1, 2, 3) x <- x^2 - mean(x)
This appears to disagree with traditional software engineering practices of avoiding global variables or avoiding a mutating global state. I understand that this principle is just a rule of thumb of sorts, but in R, this practice seems especially common (perhaps this is inherited from Scheme, on whose scoping rules R's are based?). Nevertheless, the proscription to avoid using super-assignment (
assign()with a specified parent enclosure) and the lexical-scoping behavior generally avoids many of the issues addressed in the original "Global Variables Considered Harmful" article (that is, a function that assigns to a variable with the same name as a global variable, or, in general, another variable up the chain of enclosures, is actually using a local variable in the execution environment). Additionally, R scripts are generally run in isolation using
Rscriptand modularized by calling packages from the main script, so there is little risk of interactions of global states between different scripts. Do R conventions around the global workspace violate the principle of avoiding global variables? Do the semantics of lexical scoping in R, Scheme, and similar languages generally make that principle irrelevant?