Is it a good idea to conditionally load/install libraries at the beginning of a script?

DarioBoh · September 29, 2018, 11:41pm

Say you want to quickly set up someone to run a script that requires a few packages that he might not have installed. Is there anything wrong with loading each package with something like this:

loadInstall <- function(x) {
  if (require(x, character.only = T)) return(require(x, character.only = T))
  else {install.packages(x, type="source")
    loadInstall(x)}
}
x <- 'tidyverse'
loadInstall('tidyverse')

Unsure whether to tag the question as 'teaching' because that would be my use case.

jcblum · September 30, 2018, 12:30am

I think context is everything with this particular question, so if you are talking about doing this in the context of teaching then yeah — I’d move it to #teaching. In general, I think it would help to have more info on the setting in which you envision doing this, and the problem you’re trying to solve.

I would personally not use this specific code — if anything goes wrong, the recursion is either risking having a user whom you already don’t trust to install packages getting stuck in an endless loop of package installation, or the whole function is going to error out making it moot anyway. (Take a quick skim through our #package-installation tag if you need to convince yourself that things can go wrong in all sorts of ways when installing packages!). Along these lines, note also that specifying type = source might not be a great idea, since this will fail on most Windows and Mac systems belonging to “average” useRs (who will not have installed the necessary build tools).

If you’re interested in talking more generally about some sort of code along these lines (maybe written a bit more safely), then I think that something like this can be totally okay in some circumstances, and can be anywhere from awfully rude to a complete disaster in other circumstances. It depends as much on human communication and shared understanding as it does on the technical scenario.

Your code is potentially making significant changes to the recipient’s environment (e.g., the whole tidyverse is a lot of packages when you include all the dependencies) and not cleaning up after itself, so speaking for myself, I would want to have the recipient’s informed consent before I did that.

In a teaching context, it’s also worth thinking about what you are communicating with your own actions and code and whether you want to pass those practices on to whoever you’re teaching. Learners tend to imitate the code practices they see their teachers use, and won’t know which parts are risky (and in what contexts) unless you call that out for them.

DarioBoh · September 30, 2018, 1:21am

@jcblum thanks for the response, I relabeled the question as "teaching". To context is an undergrad class where most of the students won't have the package needed to run the code: maybe they skipped a class, or it's simply the first time using that package. Others may have used R before and could have those packages.

I would definitely appreciate if you could show a safe solution along those lines. Regarding the concern about whether that may be a good idea or not, to me the issue was somewhat related to reproducibility - wouldn't the code be more reproducible if students don't have to manually install the packages? That way if you have someone showing up 10 mins late or running code by themselves, they don't get stacked wondering "what is this this error there is no function xxx?". Or maybe I could add custom error messages

But the odds of something going wrong anyways probably don't make a solution along those lines worth it.

JohnMount · September 30, 2018, 2:55am

I'd advise packages should only be installed by users, but they can be supplied a script to make that easier. My plan when teaching is one script that does all installs, and all other scripts/Rmarkdowns do not do installs. The issue is installing can have consequences.

olyerickson · September 30, 2018, 9:42am

We usually provide the students with an example RMarkdown Notebook that provides the overall structure we're looking for in the notebooks they'll be handling in (as rendered PDF) throughout the term. Depending on the course, these notebooks might contain exemplars for some methodology.

At the beginning of these notebooks we always provide a section where we ask them to conditionally load all the packages they'll be using which looks like the following:

# This code will install required packages if they are not already installed
# ALWAYS INSTALL YOUR PACKAGES LIKE THIS!
if (!require("ggplot2")) {
   install.packages("ggplot2")
   library(ggplot2)
}
if (!require("tidyverse")) {
   install.packages("tidyverse")
   library(tidyverse)
}

I'd strongly recommend this style, in particular because it encourages the students to think about reproducibility; we want them to think about their notebooks being executed by someone else.

nwerth · October 1, 2018, 1:38pm

I'm of the mind that, when teaching people, one shouldn't start them in an unrealistic environment.

R has so many coding styles, and the community rarely agrees on any single point, but one is so idiomatic I rarely see it discussed, let alone debated: all necessary packages are loaded at the top of the script with the library() function.

Users need to learn about R's packaging system, which means they need to learn how to install packages. Luckily, this is super simple in R. Here's how the process should work:

They see Error in library(tidyverse): there is no package called 'tidyverse'
They'll ask you what happened.
You give a simple explanation about packages ("They provide additional functions and data"), how to install them, and how to load them.
They install.packages("tidyverse"), load it with library(tidyverse), and move on.
Whenever they see the error message again, they know how to fix it.

Of course, a kind teacher would start at step 3.

Not to mention conditional loading of packages is a mental tax. If I see

library(tidyr)
library(dplyr)
library(forecast)
library(ggplot2)

I have a good idea of what's coming up, and it only took a tiny fraction of a second to read with minimal mental effort. But defining a function with conditional execution and functions that use optional arguments? It's unnecessary baggage.

In the worst case, the student will think this is acceptable code for everyday R scripts.

nwerth · October 1, 2018, 1:44pm

R's packaging system handles installing dependencies. Just make a package listing all the packages to load in the DESCRIPTION file's Dependencies section. Then stuff the package with functions, datasets, and whatever else they need to follow along. You could even use vignettes instead of handouts.

yutannihilation · October 2, 2018, 4:44am

pacman package might be handy for this purpose. pacman::p_load(x) is equivalent of

if (!require("x")) {
   install.packages("x")
   library(x)
}

But, for teaching, I agree with this opinion

Users need to learn about R's packaging system, which means they need to learn how to install packages.

jcblum · October 10, 2018, 9:52pm

I’m still mostly dubious about this idea in the context of teaching (I think something like RStudio Cloud is a better model for when you want to get people up and running fast without troubleshooting a bunch of environment setup), but for anybody who is interested in going this route, this recent post by Yihui might be useful:

mine · October 25, 2018, 6:04pm

I second @jcblum's idea of using something like RStudio Cloud or similar cloud based access through RStudio Server Pro for teaching purposes, especially if the goal is to avoid discussing package installation on day one.

However, if the goal is to discuss packages, then having students install them makes sense, in which case I think the syntax might be more to explain than just having them run install.packages().

olyerickson · October 25, 2018, 6:42pm

As I noted above, we've found it reinforces reproducibility "best practices" to teach the students to always conditionally load packages, and indeed to do this in one place in their notebooks --- perhaps in separate scripts they source.

TBH, simply using RStudio Server reduces a lot of the potential entropy; I couldn't imagine teaching our courses with the students on individual machines. Our biggest support issues are related to RStudio not creating folders and files with permissions that are consistent with the file system flags we've set. Teaching the students to manage permissions via the Linux shell is much harder than R package management!

Don't get me started regarding the problems caused when Knitr barfs in a shared directory

chris.prener · October 26, 2018, 2:03am

I thought I'd weigh in here @DarioBoh - I agree with @jcblum and @nwerth - I would shy away from the approach you've proposed.

I also teach with R and have gone through a couple different iterations of how I approach this issue. I now:

Point students to a software setup page on my course's website that includes install.packages(c(...)) syntax they can copy and paste into their R session. The first day of class, we talk about how install.packages() works and install cowsay, which isn't included in the pre-made syntax for package installations.
I provide students handouts that list the packages and functions we'll be discussing during each course meeting. If they do come in late, that allows them to quickly figure out which packages are critical for that lecture's content.

The first approach pre-supposes that you have a solid list of your package needs before the course starts, so I actually started using the handouts during my first semester teaching with R. The second semester, I was able to then synthesize all the package references on the handouts, add some new ones, and create the install.packages(c(...)) syntax for my students to copy.

I've wanted to use Docker, or at least offer it as an option for students who don't want to manage a local install, but have not had the IT support at my institution to implement this approach. That is still on the "someday" list for additions to my course software infrastructure.