Best practices for location of library() in .Rmd files

To me it depends the intent of the RMarkdown. If it is to teach how to do something in R, then I opt for loading the library as close as possible to where I'm going to use it so it's easy for the reader to follow where the functions came from. But if the document is intended as a report then I load all of the libraries at the top of the RMarkdown, that way I'm free to change the order of the chunks without having to worry if a needed library has already been loaded.

9 Likes

One other issue with loading them later is that, if the late-loaded package creates any conflicts, your earlier code may not re-run unless you restart your R session each time (or unload the package). It's not something I run across as often these days, but when I was pulling in a bunch of packages for one function each (and didn't have the habit of using the package::function() notation when I did), my earlier code would fail. I think there was at least one time that it just produced a different result, because the conflicting function was still valid code but produced new output.

1 Like

A "best of both worlds" option here would be to have all the library calls at the beginning and then commented out calls in the chunks so users know what's needed.

I tend to like that idea. But, pretty please, could we have more than one colour for comments
so that this sort of usage gets differentiated from other types of comment

I put all the library() calls in a separate script. Then one of my first chunks in the .md file is something like:

source('code/00_dependencies.R')

Now I'm nervous there is some obvious drawback to this approach that I've missed :neutral_face:

1 Like

An alternative method to demonstrate what libraries are needed in chunk would be to use the library::function() notation in the chunk itself. Combined with having your library() calls at the top of a script this could be the most clear for other readers, though maybe a little verbose for users familiar with the topic

I do this as well, and, if for some reason (conflicting function names, etc) I load the library earlier, I'll harken back to the fact that it was run, and explain the package right before. But, I've found that (especially when they're non-core tidyverse :package:s) it's a good way to show why X package is useful (e.g. stringr, lubridate etc.)

1 Like

I think this just makes for difficult reading— especially with packages like dplyr, and tidyr where the "verbs" (functions) will have meaning to someone new to R.

1 Like

I think the intention here is namespacing everything, such as dplyr::mutate, so the meaning is still there. I am trying to do this consistently with all functions expect the really common tidyverse ones. I would, though, namespace tibble::add_column( ) since imho it's not obvious that it's tibble and not dplyr.

That’s a good point. I had thought of it as a solution to the problem of “where is the function from?”, but in the grand scheme it may be too verbose and actually lose the value it was meant to add.

I suppose that generally it’s best to place the ‘library()’ calls together at the top, but for specific documents and audiences it may be better to include them as needed (for example in an educational context).

1 Like

I tend to always put them at the top for .R or .Rmd, except for the expository case described by @edgararuiz.

But sometimes I do add a comment mentioning why I load this package, especially if it's for one-off use of a specific function, e.g.

library(lubridate)             ## for guess_formats()

This way you also have a better chance of noticing that you're loading things you don't need to, e.g. if one day the script no longer calls lubridate::guess_formats().

3 Likes

source("code/00_dependencies.R")

I do this as well. Mostly because typically I use a few packages but I use them a lot in different functions and chunks. If I use a package only once or twice I won't load the package at all neither beginning of script nor right before usage but use packagename::functionname instead. I find this useful combined w packrat: I have a few packages used in a project than I can source my_libraries.R
from any rmd or other script within that project.

1 Like

So... this all leads me to wonder: couldn't the library calls be automated? I.e. Rstudio detects when a function is used from a package that is not loaded, and then adds it to a code chunk on top with all the library calls. Or adds it to the current chunk if you have the "add library() to current chunk" option clicked. If the function isn't recognized, you get a message. And of course if there's a conflict, you get a pop up that asks "Do you want select from the dplyr or MASS package?" Seriously, this is a task that a computer could do very well, and a human (at least this one...) cannot.

Yes, there is at least base::autoload(). I think I have seen also another pkg with this functionality, but can‘t remember now.

I usually also call library at the top. However, when I do this very late, I pay a lot of attention to any conflict messages (contrasts), to not break any existing code. In case of any risks I use package::function notation.

I think it'd be a fairly easy extension of @milesmcbain's deplearning :package:

2 Likes

Do you mean packup? https://github.com/MilesMcBain/packup

It does what @jtr13 is suggesting already.

2 Likes

Then, yes, of course that's what I meant! :stuck_out_tongue_winking_eye:

1 Like

Super, will check this out. Thanks everyone -- love this forum -- I am learning so much!

1 Like

I tend to use package managers for library load and management: pacman

https://cran.r-project.org/web/packages/pacman/vignettes/Introduction_to_pacman.html

With pacman::p_load() instead of 5 lines of code to load 5 common packages, like this:

library(dplyr)
library(tidyr)
library(Hmisc)
library(magrittr)
library(janitor)

You can write one line

pacman::p_load(dplyr, tidyr, Hmisc, janitor, magrittr)

if the package is not available on the system, it will first install it (through install.packages), and only then try to load it again. Same as library installr

installr::require2

All of this, on the very first part of any R file.

2 Likes

I'm generally a fan of pacman, particularly when I start using less common packages. That way, when I need to re-run my analysis down the road, it can automatically reinstall them since I've inevitably lost (or cleaned out) my packages since the last time I ran it. The "bootstrapping" script header provided by pacman::p_boot() is also convenient if you share scripts with people less familiar with R (or at least people that won't mind your script auto-installing some packages):

if (!require("pacman")) install.packages("pacman"); library(pacman)
3 Likes