Unexpected behavior of dplyr::case_when

kent37 · June 22, 2018, 11:26am

This surprised me:

x = 1:2

dplyr::case_when(
  length(x)==1 ~ x,
  length(x)==2 ~ x[2]
)
#> [1] 2 2

It seems that case_when is evaluating each RHS and using that to determine the length of the result. Is there a way to avoid this? In my actual use case I am always expecting a character(1) so I subscript the result but it feels pretty hacky. It's easy to imagine situations where not every RHS is a valid expression in the current environment.

Any thoughts?
Kent

mara · June 22, 2018, 12:00pm

How do you expect the RHS to be evaluated? (Genuine question, I'm not quite clear!)

Couple of GH issues that might be illuminating and/or you might want to jump in on:

github.com/tidyverse/dplyr

Document how to supply list of formulas to case_when()

opened 12:04PM - 29 Jun 17 UTC

closed 06:10PM - 21 Nov 18 UTC

lionel-

feature documentation

e.g. https://stackoverflow.com/questions/44822256/use-dplyrcase-when-with-argume…nts-programmatically/44823159 Formulas should be created with `exprs()` (so they are unevaluated) if they refer to data columns or `.data` / `.env` pronouns. Mention that supplying formulas directly to `case_when()` within a `mutate()` works precisely because they are captured unevaluated by `mutate()`.

(plus StackOverflow thread related to the above)

kent37 · June 22, 2018, 1:06pm

I expected that the result of the case_when would be the same as the result of evaluating the relevant RHS. In my example, the second LHS is true so I expect the value to be x[2] which is 2. Instead, case_when seems to be evaluating every RHS, deciding that the result should have length 2, and extending the value to c(2, 2).

My actual use case was formatting a result which has varying number of values. Here is an example which is closer to what I was doing:

format_x = function(x) {
  dplyr::case_when(
    length(x)==1 ~ as.character(x),
    length(x)==2 ~ paste(x[1], 'and', x[2])
  )
}

format_x(2)
#> [1] "2"
format_x(1:2)
#> [1] "1 and 2" "1 and 2"

The repeated result for format_x(1:2) is confusing and not what I intended.

I don't see the relevance of the linked issues and SO, they all seem to have to do with NSE which I am not using here.

mara · June 22, 2018, 1:25pm

Oh, I thought you wanted to change the evaluation of the RHS.

hoelk · June 22, 2018, 1:35pm

case when is vectorized. The output of case_when() will always have the same length as x. You are checking for each element of x whether its length equals 1. This is nonsensical.

For your application you likely do not want to use case_when() but a normal if (...) else construct

kent37 · June 22, 2018, 1:40pm

Ah, <light goes on>. Thank you @hoelk.

hoelk · June 22, 2018, 1:44pm

For your case there is an even simpler (but not very flexible) solution btw

paste(x, collapse = " and ")

kent37 · June 22, 2018, 5:12pm

My real use case is a little more complex. I rewrote it using switch(length(x)+1, ...) and it works as I wanted . (Using length(x)+1 because the length can be 0.)