I teach a series of short workshops that cover R essentials, data manipulation with dplyr, tidying with tidyr, plotting with ggplot2, and other (bioinformatics) domain-specific topics.
I lead by asking students to install the tidyverse package and loading it at the top of nearly every script. This conveniently loads dplyr, tidyr, readr, and ggplot2, but it introduces a complexity from the beginning – newcomers are trying to wrestle with R, RStudio, understanding packages, writing code for perhaps the first time. On top of all this, I then need to explain the tidyverse package as a kind of “meta-package” that conveniently installs and loads lots of other packages. And this further obscures the fact that they’re using functions from specific packages:
filter from dplyr,
gather from tidyr,
read_csv from readr, etc. You could argue that it doesn’t matter in the beginning, but when I later teach other classes with Bioconductor packages, I run into a namespace issue where I have to explain a student needs to use
dplyr::filter() instead of the
filter() that the Bioconductor package used.
Finally, I’ve run into a few cases where students using Windows have run into problems with installation/loading, getting that odd error message long the lines of
Error : object 'as_factor' is not exported by 'namespace:forcats'.
My question is this: For beginners, is it better to teach installing/loading the tidyverse package, or installing/loading individual packages as needed?