Creating global object vs. storing it as sysdata.rda

mauro_lepore · November 21, 2017, 9:47pm

My question is about style. Skimming through code in the tidyverse package, I was surprised to see the code chunk below. I had expected that any non-exported object would be saved in sysdata.rds.

Why is "core" defined in this way versus stored in sysdata.rds?
Is it because it is simple enough to be created in place, and because this way the code becomes more readable?

From: https://github.com/tidyverse/tidyverse/blob/master/R/attach.R

core <- c("ggplot2", "tibble", "tidyr", "readr", "purrr", "dplyr", "stringr", "forcats")

core_loaded <- function() {
  search <- paste0("package:", core)
  core[search %in% search()]
}

jimhester · November 21, 2017, 10:22pm

The first principle should always be what is the most readable / maintainable. Here the data is a short vector that is easily readable in one line so it makes sense to just include in inline with the rest of the code.

If the data was very long or was rectangular, maybe sysdata.rda would be used, but in simple cases it is best to keep the data as close to the code as possible.

Also note if you are using sysdata.rda the extension is .rda, not .rds. the file an image file generated with save() not a data set file saved with saveRDS().

mauro_lepore · November 22, 2017, 3:48pm

Thanks for helping me develop a sense of style! And thanks for noting my mistake above (not sysdata.rds but sysdata.rda).

Would this alternative be acceptable? [My goal here is to avoid "core" being a global variable, that might unexpectedly conflict somewhere else]

get_core <- function() {c("ggplot2", "tibble", "tidyr", "readr", "purrr", "dplyr", "stringr", "forcats")}

core_loaded <- function() {
search <- paste0("package:", get_core())
get_core()[search %in% search()]
}

nick · November 22, 2017, 3:56pm

How would it unexpectedly conflict if it's not exported from the package? Changing something into a function doesn't make it less likely to conflict; core and get_core are both objects that shouldn't conflict if they aren't exported.

mauro_lepore · November 22, 2017, 4:11pm

Thanks Nick! You make me realize that I should have clarified that I was thinking about conflicts within the package (which might cause bugs and indirectly affect users too). I had in mind this quote:

"Anything you can do with global data, you can do better with access routines. The use of access routines is a core technique for implementing abstract data types and achieving information hiding."
--"Code Complete (Developer Best Practices)" by Steve McConnell Code Complete (Developer Best Practices) - Kindle

In this particular case, I believe that the intent of the code is not to hide but to expose information -- that is, to make obvious what the content of "core" is.

nick · November 22, 2017, 4:53pm

The general point about get_core and core both being objects still applies, however. In R, a function is essentially a "global" (for the relevant namespace) variable that happens to work as a function. So I could redefine get_core later in the code, and R wouldn't complain -- unlike in many standard compiled languages, which is where I suspect that best practice comes from. The package writer has to keep track of what is pre-existing and avoid reusing it, or if it is redefined within a function, not forget that it has been redefined.

Now, if tidyverse:::core was being modified within the functions, that would be a bigger problem. In this case, you can almost treat it as a C-style preprocessor macro definition, as it never changes after creation.