Naming: do we call it `core tidyverse` vs.?

Thank you for this tread! Related, I have a reviewer who is insisting "tidyverse is not an R package it is a collection of packages" ...

I do believe it is both a package itself and collection of packages. I feel a nuts asking, but this is correct, right? Is there something going on in the community about trying to avoid calling tidyverse a package? I didn't find any info, but I'm trying to give the reviewer the benefit of the doubt.

For full disclosure, here is the sentence in my methods.
"The dataset was reshaped and analyzed to obtain aggregate issue-level, article-level, and database-level metrics using base functions in the statistical programming language R as well as the packages tidyverse (Wickham 2017b) and stringr (Wickham 2017a)."

@1heidi you've hit on part of what I'm trying to nail down too. This is not a crazy question at all. I asked because I am writing a book on R and I want to express this properly. The language here is a little confusing, TBO.

I am calling everything which gets loaded by library(tidyverse) as "Tidyverse Core" or "Tidyverse Base"
Then everything that gets installed by install.packages('tidyverse') I'm calling 'Tidyverse non-core" or "Tidyverse Recommended".

The second options there (Base & Recommended) follow the R naming conventions for base & recommended packages. I personally kinda like core and non-core.

At least that's where my head is today. It might change by tomorrow. And that has nothing to do with appeasing your reviewer, of course.

1 Like

Thanks for the reply! That does complicate things, but I can see how the distinction will be especially important for book. Small bright side is you'll have the space to provide actual definitions. Maybe no matter what I call it in the paper, it'll be a little confusing since the names aren't entirely pinned down. It seems a little odd, but maybe I should just cut to the chase and reference the call directly. How weird is this:

"The dataset was reshaped and analyzed to obtain aggregate issue-level, article-level, and database-level metrics using base functions in the statistical programming language R as well as the package stringr (Wickham 2017a) and Tidyverse Base loaded via library(tidyverse) (Wickham 2017b).

I have made the data, documentation, and scripts openly available though, so anyone who is truly interested in my methods will refer to them and not the article's methods sections. So hopefully that will help alleviate some confusion.

Thanks again.

1 Like

I talk about the tidyverse CRAN package as a "meta package". It is a convenience for installing a bunch of packages at once. If you attach it via library(tidyverse), it is a convenient way to attach a subset of those packages at once -- namely, the "core packages".

As for your reviewer, I guess I can see both sides. You definitely could spell out which tidyverse packages you actually use. But I tend to agree with you. If you're using a function or two each, from several core packages, it seems sensible and concise to say that you used the tidyverse (referring to the meta package).

2 Likes

Thank you, Jenny! That's helpful to think about it as a meta package. I have the attached packages listed my README, so I'll keep smithing the methods to be as clear but short as possible and then reiterate that detail should be sought in the documentation that accompanies dataset & scripts. The methods will get unreadable really fast otherwise.

(While I'm here - thank you (& the TAs) so much for HappyGitwithR!)

2 Likes

I think using "Base" for anything other than base R would be super confusing. I know you're not taking a poll here, but what is a forum the internet for if not giving unsolicited opinions? :stuck_out_tongue_winking_eye:

4 Likes

I like "core" and "more". "Tidy by Nature" sounds like a great idea for the series of R t-shirts that I'll design one of these days.

2 Likes

Regarding how to reference tidyverse and other packages in papers: what you want in the methods section of a paper is replicability; so you usually not only need to refer to what tools you are using but also to their versions, because things can change significantly across versions.

To that regard, if you reference a given version of the tidyverse meta-package, let's say 1.2.0, does it imply a given version of each of the underlying packages or does tidyverse just load the latest version available? If this is the second case, then I guess your reviewers are right @1heidi and what you should really do is document the versions of the underlying packages.

Also, you might what to consider the issue of citations for package authors. By citing tidyverse and Hadley you are not citing the authors of the underlying packages (which are not all by Hadley) when it could be considered that the real work was done by them. I'm not saying Hadley does not deserve citations :wink: but others do too.

PS: Finally, there is a small inconsistency in your methods @1heidi because stringr is core tidyverse so you should cite either just tidyverse or every core package that you use.

3 Likes

Thank you @jiho! I did not think about citations for the other package authors. That's a great point, and I'll double check. I've already documented package versions (along with other session info) in the README for the repo/dataset (https://github.com/1heidi/nar if curious), and I've referenced it repeatedly in the paper. That has just proved helpful ... I used tidyverse_1.1.1, which does not include stringr in the core. During analysis I attached stringr_1.2.0 separately. This does convince me I better add package versions to the text in the methods section, too - I'm just approaching page limits! Thanks to all for the input. Much appreciated!