Which Packages Get Loaded?


#1

Installing tidyverse is the first thing I did on my new work computer. It installs >50% of the packages I use on a daily basis. But, when I call library(tidyverse), only a few of them load! Is there a reason some load but not the others? How did that decision get made?

My friend Eric started a petition about this topic. I personally prefer stringr to lubridate, for what it’s worth: https://www.change.org/p/hadley-wickham-load-lubridate-with-library-tidyverse


Create .Rprofile automatically when creating new project
#2

The goal is to load packages that you need in 90% of data analyses.

I’m likely to add forcats and lubridate to this list in the future - there are just a couple of small lubridate issues we need to resolve first.


#3

I assumed so. The data analyst who received data that doesn’t require some string manipulation before they are ready to do anything else is a lucky person.


#4

What would be really cool is the ability to customize what gets loaded, say, by setting an environment variable. That would complicate reproducibility, though, and could be tweaked in one’s .Rprofile right now anyway.

More practical may be a function to load a minimal set (dplyr, tidyr and ggplot2, maybe?) and a maximal one (all the top-level packages), though the option may encourage bad coding habits/loading unused packages.


#5

I’m generally terrified to mess with my .Rprofile file, but how would I manipulate that to change what library(tidyverse) loads?


#6

I wouldn’t change that (code should do what it says), but you could define a function like

load_my_packages <- function(){
    lapply(c('dplyr', 'tidyr', 'ggplot2'), library, character.only = TRUE)
}

that you could call at the start of each session, or just put the actual library calls there so they’re loaded on startup. As mentioned before, neither is recommended if you’re worried about reproducibility.


#7

Gotcha. I share a lot of code with coworkers, and it’s only about 5 more lines of code at the beginning, so I’ll just keep doing library(magrittr) at the top of the files. I probably should learn how to use .Rprofile at some point, though.


#8

Mine’s actually empty at the moment aside from some quoted-out .libPaths shenanigans that caused as much trouble as it saved.

I do use .Renviron to set environment variables for packages that require external resources, like reticulate and sergeant. It could also be used as a way to keep API keys out of scripts, but keyring is a much more secure alternative.


#9

The reason I am intimidated by .Rprofile is because of .libPaths shenanigans! And keyring is great! Before I found out about that, I was keeping my database credentials in a CSV file stored locally – not the best option!


#10

I’m chalking this up as a successful petition!


#11

FWIW I think it’s a bad idea to load data analysis packages in your .Rprofile. It increases the likelihood your code won’t be reproducible because you’ve forgotten to load an important package. (And similarly that’s why library(tidyverse) will never be customisable - it has to work the same way on everyone’s computers)


#12

You can customize what packages get loaded into R generally by using the defaultPackages argument to options() and sticking it in .Rprofile. I do this for packages like bookdown or blogdown because these are not packages that users will need to load in order to run my code.


#13

I got advice from Hadley Wickham and Roger Peng… What a day!


#14

You can register a hook (custom function) that is called when tidyverse is attached. For instance, add the following to your ~/.Rprofile file:

setHook(packageEvent("tidyverse", event = "attach"), function(...) {
    library("stringr")
    library("lubridate")
})

This avoids preloading any packages when you start R while at the same time causes those extra packages to be attached whenever tidyverse is attached.

PS. If you use startup::startup() in ~/.Rprofile, you can put the above in a standalone file e.g. ~/.Rprofile.d/package=tidyverse.R, which is then only process if and only if tidyverse is installed.


#15

This has the same problem mentioned by @hadley earlier, though: you’re making your analysis harder to reproduce. At the very least, this should be wrapped in if (interactive()) {…}.


#16

Yeah, I think it’s best to reserve ~/.Rprofile tweaks for packages you use interactively, like (as @rdpeng suggests), blogdown and bookdown. In my ~/.Rprofile I have:

if (interactive()) {
  suppressMessages(require(devtools))
  suppressMessages(require(usethis))
  suppressMessages(require(testthat))
}

#17

Go home and call it a day


#18

One thing that surprised me is that dplyr doesn’t appear to use the %<>% operator from magrittr, even though %>% is used. I really like the former! But I can always library(magrittr), so it’s no biggie.


#19

Adding lubridate would really help me out. Hope the issues are easy to sort out.


#20

FWIW, I love stringr but encounter people who feel strongly about other string packages. They might not like a stringr default in library(tidyverse). I sense that there’s a balance here between making people’s lives easy and annoying them.