Using `recode_factor` With Quasi-quotation

Hello RStudio Cognoscenti,

I'm trying to implement quasi-quotation with dplyr::recode_factor(). I would like to dynamically re-map (recode) the individual values AND re-level the factor levels. In the reprex below the original values are [A, B, C]s and the levels are A B C (default). I would like to map: A -> Amigo, B -> Be, and C -> Cool as well as reorder the factor levels to: Be Cool Amigo. Using dplyr::recode() works with quasi-quotation, but it ignores the re-leveling defined by level_key. However, using quasi-quotation with dplyr::recode_factor breaks. I can achieve the re-leveling using recode_factor() by hard coding the arguments, however this will not work for the implementation I have in mind. Am I using the (!!!) incorrectly? Is this a bug? I do realize I could solve this in two steps. Apologies in advance if this has already been addressed and I've missed it.

Thank you,
Stu

  library(magrittr)
  library(dplyr)
  set.seed(100)
  x <- sample(head(LETTERS, 3), 10, replace = TRUE) %>% factor
  x
  level_key <- list("Be", "Cool", "Amigo") %>% set_names(c("B", "C", "A"))    # set factor re-level 
  level_key
  dplyr::recode(x, !!!level_key)            # no re-leveling; desired B C A :(
  dplyr::recode_factor(x, B = "Be", C = "Cool", A = "Amigo")   # correct levels; but not quasi-quotation :(
  dplyr::recode_factor(x, !!!level_key)     # errors out :(

This is very puzzling indeed. I'm not sure why are you getting the error. I've even looked at qq_show and output for both commands is identical, so from my point of view it should work as well:

> rlang::qq_show(dplyr::recode(x, !!!level_key))
dplyr::recode(x, B = "Be", C = "Cool", A = "Amigo")
> rlang::qq_show(dplyr::recode_factor(x, !!!level_key)) 
dplyr::recode_factor(x, B = "Be", C = "Cool", A = "Amigo")

Now it's even more puzzling! Do you also get the same error as I do?

I'm getting this one:

> dplyr::recode_factor(x, !!!level_key)
Error in !level_key : invalid argument type

Yep! That's the one! ... unfortunately :exploding_head:

The problem is buried inside of recode_factor. It is that only functions that are implemented to use quosures can properly interpret !!!. recode_factor itself is implemented to use quosures but the c() which it uses is not.

Here is where the problem is in recode_factor:

dplyr::recode_factor
#> function (.x, ..., .default = NULL, .missing = NULL, .ordered = FALSE) 
#> {
#>     recoded <- recode(.x, ..., .default = .default, .missing = .missing)
# -----------------------------The problem is the c() function here ---
#>     all_levels <- unique(c(..., recode_default(.x, .default, 
#>         recoded), .missing))
#>     recoded_levels <- if (is.factor(recoded)) 
#>         levels(recoded)
#>     else unique(recoded)
#>     levels <- intersect(all_levels, recoded_levels)
#>     factor(recoded, levels, ordered = .ordered)
#> }
#> <environment: namespace:dplyr>

Created on 2018-03-03 by the reprex package (v0.2.0).

The problem is in the Combine, c(), function. Although in !!! layer_key, !!! looks like an operator or in !!!(layer_key) it looks like a function it is neither.

It depends on the function getting the !!! layer_key in a ... argument to do something like enquo(...) so that standard evaluation is bypassed. It is enquo(or one of it's friends) that interprets !!! to turn a list into individual dot arguments.

But c() is a primitive function so it just does standard evaluation of !!! level_key which, in effect, produces a syntax kind of error.

Here is an example of a function that tries to use !!! with c()


suppressPackageStartupMessages(library(tidyverse))
f2 <- function(...) {
    c(...)
}
level_key <- list("Be", "Cool", "Amigo") %>% set_names(c("B", "C", "A"))    # set factor re-level 

f2(!!! level_key)
#> Error in !level_key: invalid argument type

Created on 2018-03-03 by the reprex package (v0.2.0).

I don't see a workaround for this.

3 Likes

Hmmm. Thank you for your clear explanation.

It seems odd to me that dplyr::recode() and dplyr::recode_factor() have differing behaviors given the same input syntax. Do you think it's worth creating an Issue on the dplyr GitHub repo? I think your research and explanation clearly demonstrate that there's an issue, I'm just surprised I'm the first one to stub their toe on it.

Thoughts on next step(s)? For now I could easily refactor in a second step, though I doubt this behavior was the intended one when dplyr::recode_factor() was first conceived (tho I concede that non-standard evaluation was probably not at the forefront of the design at the time).

Thanks again.

Already added an issue to tidyverse/dplyr.

Here is the issue: https://github.com/tidyverse/dplyr/issues/3390

I think that quosures in general are not widely used yet, which is why no one else has run into the issue before... or, which may be the most likely reason, they have run into the issue and just assumed that they were doing something wrong :grin: .

Over time I think the tidyverse functions will get much more robust with respect to quosures but at the moment it seems to need some smoothing around the edges.

2 Likes

All better now if you instal the dev version of dplyr!

suppressPackageStartupMessages(library(tidyverse))
set.seed(100)
x <- sample(head(LETTERS, 3), 10, replace = TRUE) %>% 
  factor
level_key <- list(B = "Be", C = "Cool", A = "Amigo")
recode_factor(x, !!!level_key)
#>  [1] Amigo Amigo Be    Amigo Be    Be    Cool  Be    Be    Amigo
#> Levels: Be Cool Amigo

Created on 2018-03-26 by the reprex package (v0.2.0).

1 Like