Case_when: second argument affects replacement value in the first argument

dplyr
case_when

#1

Hi! I think I'm getting unexpected results from case_when(), but since I'm a newbie, I thought I should check here before filing an issue.. I wasn't able to find a post about this on this site, stackoverflow or in the dplyr github repo issues.

eg. 1 shows the behaviour I expect. But in eg. 2, where the RHS of the second argument is length 2, the RHS of the first argument gets printed twice. This is true even when the LHS of the second argument is FALSE (eg. 3).

When I try eg. 3 with an ifelse statement,first is only printed once (eg. 4), so it seems to be a case_when thing..?

Thanks in advance!
Praer

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.1
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

first <- "A"
second <- c("B", "C")
third <- "D"

## eg. 1
case_when(
  TRUE ~ first,
  TRUE ~ third
)
#> [1] "A"

## eg. 2
case_when(
  TRUE ~ first,
  TRUE ~ second
)
#> [1] "A" "A"

## eg. 3
case_when(
  TRUE ~ first,
  FALSE ~ second
)
#> [1] "A" "A"

## eg. 4
ifelse(TRUE, first, second)
#> [1] "A"

length(second)
#> [1] 2

Created on 2018-09-05 by the reprex
package
(v0.2.0).


#2

Personally, it looks like a bug, but documentation does cover your case specifically (I've added not there and highlighted the bits that are important):

The LHS must evaluate to a logical vector. The RHS does not need to be logical, but all RHSs must evaluate to the same type of vector.
Both LHS and RHS may have the same length of either 1 or n. The value of n must be consistent across all cases. The case of n == 0 is treated as a variant of n != 1.

This means that from the point of view of case_when it does the right thing in your second and third examples. Specifically, it would complain if you do something like this:

library(dplyr)
first <- "A"
second <- c("B", "C")
third <-  c("D", "E", "F")

case_when(
  TRUE ~ first,
  TRUE ~ second,
  TRUE ~ third
)
#> Error: `TRUE ~ third` must be length 2 or one, not 3

Created on 2018-09-05 by the reprex package (v0.2.0).

But if you do this:

library(dplyr)
first <- "A"
second <- c("B", "C", "3")
third <-  c("D", "E", "F")

case_when(
  TRUE ~ first,
  TRUE ~ second,
  TRUE ~ third
)
#> [1] "A" "A" "A"

everything is as in your example. Specifically, case_when will recycle the result and make sure that all possible results have the same length, even if it only looks at first case.


#3

I see! I misunderstood the documentation - thank you so much for the clarification! :blush:


#4

:+1: Yeah, @sowla noticed that, too! It's fixed in the R and .Rd files, and will be propagated to the html in the docs when we rebuild the site next.