Use of .data in tidyselect expressions is now deprecated

In tidyselect 1.2.0, there is a new change:

  • Use of .data in tidyselect expressions is now deprecated to more cleanly separate tidy-select from data-masking. Replace .data$x with "x" and .data[[var]] with any_of(var) or all_of(var).

I have been using .data$x inside my functions, so now all of them generate warnings. After I blindly replaced all .data$x with "x", I realized this is not appropriate in all cases. While this works for select(), it does not make sense for mutate(). Will .data be deprecated in other cases as well?

2 Likes

My understanding is:

  • "tidyselect expressions" are situations where you'd use any of the {tidyselect} functions like all_of(), contains(), starts_with(), and so on. So functions like select() and across().

  • "data-masking" are other situations where you provide an unquoted column name to be operated on. So functions like mutate(), summarise(), etc.

So my understanding is that its only for functions like select() where updates need to be made because, as you say, if you just used "x" in mutate() you'd end up with a column containing the literal character string "x".

Communication hasn't been great on this update, normally we'd get a blog post or something to explain what is going on. Perhaps one is forthcoming.

1 Like

It appears that this blog post: Programming with dplyr • dplyr has been rewritten to describe the situation. From what I can tell it was redeployed in the last couple days.

There is a blog post in preparation at tidyselect 1.2.0 by lionel- · Pull Request #600 · tidyverse/tidyverse.org · GitHub. We'll merge it soon.

1 Like

It would be great to hear some reasoning behind this change. In my opinion the .data$x and .data[[x]] provided at least a somewhat consistent way to refer to the variables in the data within a package code irrespectively whether you were filtering, mutating, selecting or pulling something out of the data frame or pivoting. Now we need to mix in also "x", any_of(x) and all_of(x) depending on what we are doing, and at least for me it is not always clear what works and when.

1 Like

Regarding .data$x we think that select("x") and pivot_longer(c("x", y")) are so much cleaner than select(.data$x) and pivot_longer(c(.data$x, .data$y)) that we prefer to converge to a unique syntax for selections that is easy to read.

Regarding .data[[var]], while it brings consistency with data-masking contexts, it causes inconsistencies within selection contexts where sometimes you'd use all_of() and sometimes .data. We favour consistency within selection contexts.

Also we think distinguishing data-masking and selection contexts is a feature because they behave so differently in general.

2 Likes

And how far back does this go? Deprecation was added in tidyselect 1.2.0, but when was the new method introduced?

I wouldn't be surprised if "x" had always worked. all_of() was introduced in January 2020.

1 Like

Something like all_of() is a little easier to search for.

I am not sure if "x" always worked. I think I made my first proper R package around 2018 and ran into tidy-select/data-masking issues (properly setting everything up to pass R CMD check can be tricky). I had to look into this topic in more depth for the first time then and don't remember seeing "x" as an option mentioned anywhere. I would not have used .data$x otherwise. Of course, it's certainly possible that it just wasn't explicitly mentioned anywhere.

yup "x" hasn't been a recommended option until recently.

Thanks, Lionel for the explanations, I see your point. I wish the blog post came online at the same time (or before) the update, that would have likely reduced the initial confusion and negativity.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.