Is it a good idea to conditionally load/install libraries at the beginning of a script?

rstudio

#1

Say you want to quickly set up someone to run a script that requires a few packages that he might not have installed. Is there anything wrong with loading each package with something like this:

loadInstall <- function(x) {
  if (require(x, character.only = T)) return(require(x, character.only = T))
  else {install.packages(x, type="source")
    loadInstall(x)}
}
x <- 'tidyverse'
loadInstall('tidyverse')

Unsure whether to tag the question as 'teaching' because that would be my use case.


#2

I think context is everything with this particular question, so if you are talking about doing this in the context of teaching then yeah — I’d move it to #teaching. In general, I think it would help to have more info on the setting in which you envision doing this, and the problem you’re trying to solve.

I would personally not use this specific code — if anything goes wrong, the recursion is either risking having a user whom you already don’t trust to install packages getting stuck in an endless loop of package installation, or the whole function is going to error out making it moot anyway. (Take a quick skim through our #package-installation tag if you need to convince yourself that things can go wrong in all sorts of ways when installing packages!). Along these lines, note also that specifying type = source might not be a great idea, since this will fail on most Windows and Mac systems belonging to “average” useRs (who will not have installed the necessary build tools).

If you’re interested in talking more generally about some sort of code along these lines (maybe written a bit more safely), then I think that something like this can be totally okay in some circumstances, and can be anywhere from awfully rude to a complete disaster in other circumstances. It depends as much on human communication and shared understanding as it does on the technical scenario.

Your code is potentially making significant changes to the recipient’s environment (e.g., the whole tidyverse is a lot of packages when you include all the dependencies) and not cleaning up after itself, so speaking for myself, I would want to have the recipient’s informed consent before I did that.

In a teaching context, it’s also worth thinking about what you are communicating with your own actions and code and whether you want to pass those practices on to whoever you’re teaching. Learners tend to imitate the code practices they see their teachers use, and won’t know which parts are risky (and in what contexts) unless you call that out for them.


#3

@jcblum thanks for the response, I relabeled the question as "teaching". To context is an undergrad class where most of the students won't have the package needed to run the code: maybe they skipped a class, or it's simply the first time using that package. Others may have used R before and could have those packages.

I would definitely appreciate if you could show a safe solution along those lines. Regarding the concern about whether that may be a good idea or not, to me the issue was somewhat related to reproducibility - wouldn't the code be more reproducible if students don't have to manually install the packages? That way if you have someone showing up 10 mins late or running code by themselves, they don't get stacked wondering "what is this this error there is no function xxx?". Or maybe I could add custom error messages :smile:

But the odds of something going wrong anyways probably don't make a solution along those lines worth it.


#4

I'd advise packages should only be installed by users, but they can be supplied a script to make that easier. My plan when teaching is one script that does all installs, and all other scripts/Rmarkdowns do not do installs. The issue is installing can have consequences.


#5

We usually provide the students with an example RMarkdown Notebook that provides the overall structure we're looking for in the notebooks they'll be handling in (as rendered PDF) throughout the term. Depending on the course, these notebooks might contain exemplars for some methodology.

At the beginning of these notebooks we always provide a section where we ask them to conditionally load all the packages they'll be using which looks like the following:

# This code will install required packages if they are not already installed
# ALWAYS INSTALL YOUR PACKAGES LIKE THIS!
if (!require("ggplot2")) {
   install.packages("ggplot2")
   library(ggplot2)
}
if (!require("tidyverse")) {
   install.packages("tidyverse")
   library(tidyverse)
}

I'd strongly recommend this style, in particular because it encourages the students to think about reproducibility; we want them to think about their notebooks being executed by someone else.


#6

I'm of the mind that, when teaching people, one shouldn't start them in an unrealistic environment.

R has so many coding styles, and the community rarely agrees on any single point, but one is so idiomatic I rarely see it discussed, let alone debated: all necessary packages are loaded at the top of the script with the library() function.

Users need to learn about R's packaging system, which means they need to learn how to install packages. Luckily, this is super simple in R. Here's how the process should work:

  1. They see Error in library(tidyverse): there is no package called 'tidyverse'
  2. They'll ask you what happened.
  3. You give a simple explanation about packages ("They provide additional functions and data"), how to install them, and how to load them.
  4. They install.packages("tidyverse"), load it with library(tidyverse), and move on.
  5. Whenever they see the error message again, they know how to fix it.

Of course, a kind teacher would start at step 3.

Not to mention conditional loading of packages is a mental tax. If I see

library(tidyr)
library(dplyr)
library(forecast)
library(ggplot2)

I have a good idea of what's coming up, and it only took a tiny fraction of a second to read with minimal mental effort. But defining a function with conditional execution and functions that use optional arguments? It's unnecessary baggage.

In the worst case, the student will think this is acceptable code for everyday R scripts.


#7

R's packaging system handles installing dependencies. Just make a package listing all the packages to load in the DESCRIPTION file's Dependencies section. Then stuff the package with functions, datasets, and whatever else they need to follow along. You could even use vignettes instead of handouts.


#8

pacman package might be handy for this purpose. pacman::p_load(x) is equivalent of

if (!require("x")) {
   install.packages("x")
   library(x)
}

But, for teaching, I agree with this opinion :slight_smile:

Users need to learn about R's packaging system, which means they need to learn how to install packages.


#9

I’m still mostly dubious about this idea in the context of teaching (I think something like RStudio Cloud is a better model for when you want to get people up and running fast without troubleshooting a bunch of environment setup), but for anybody who is interested in going this route, this recent post by Yihui might be useful: