case_when() executes custom function although condition isn't met

RSzt · February 25, 2019, 11:28pm

Could you please explain why does case_when() execute own() function although condition isn't met - I know that the output is ok (NA), but in some cases (when condition isn't met) execution shouldn't happen to avoid errors. In the below example "own!" is printed x4 - my expectation: no prints.

own <- function(x){
  
  print("own!")
  return(1)
}

dplyr::tibble(a = 1:4) %>%
dplyr::group_by(a) %>%
dplyr::mutate(x = 
                dplyr::case_when(
                  1 == 2 ~ own(a),
                  TRUE ~ NA_real_
                  )
              )

Is there a better way to handle cases like this within dplyr?

mara · February 25, 2019, 11:57pm

Adding a reprex so others can see output;

suppressPackageStartupMessages(library(tidyverse))
own <- function(x){
  
  print("own!")
  return(1)
}

dplyr::tibble(a = 1:4) %>%
  dplyr::group_by(a) %>%
  dplyr::mutate(x = 
                  dplyr::case_when(
                    1 == 2 ~ own(a),
                    TRUE ~ NA_real_
                  )
  )
#> [1] "own!"
#> [1] "own!"
#> [1] "own!"
#> [1] "own!"
#> # A tibble: 4 x 2
#> # Groups:   a [4]
#>       a     x
#>   <int> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     3    NA
#> 4     4    NA

^{Created on 2019-02-25 by the reprex package (v0.2.1)}

cderv · February 25, 2019, 11:58pm

case_when will execute all the LHS and RHS, then keep based on conditions. So your print statement is executed, one time by each a during the RHS evaluation. This is how it works with case_when.

Why do you need to print inside the own function ?
If it is only for logging purposes, you can use message and then suppressMessages when you don't want them to print

own <- function(x){
  message("own!")
  return(1)
}
library(magrittr)
dplyr::tibble(a = 1:4) %>%
  dplyr::group_by(a) %>%
  dplyr::mutate(x = 
                  dplyr::case_when(
                    1 == 2 ~ suppressMessages(own(a)),
                    TRUE ~ NA_real_
                  )
  )
#> # A tibble: 4 x 2
#> # Groups:   a [4]
#>       a     x
#>   <int> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     3    NA
#> 4     4    NA

^{Created on 2019-02-26 by the reprex package (v0.2.1)}

hughparsonage · February 26, 2019, 2:22pm

Use if:

own <- function(x) {
  print("own!")
  return(1)
}
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
dplyr::tibble(a = 1:4) %>%
  dplyr::group_by(a) %>%
  mutate(x = if (1 == 2) own() else NA_real_)
#> # A tibble: 4 x 2
#> # Groups:   a [4]
#>       a     x
#>   <int> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     3    NA
#> 4     4    NA

RSzt · February 26, 2019, 7:08pm

Thank you very much for all the answers!

@mara and @cderv - I used print() as an example to show that this part is executed every time - the real code is more complicated (it doesn't print anything, just calculates). The thing is that in my case mutate with case_when is a part of longer dplyr flow (with pipes) and I've been wondering if there is a way to control "execution" of particular mutate. My "flow" is not a single-case script, but should handle more extended cases. One of these throws error - and this is ok in a sense that not all needed data is provided, thus I want not to execute this by adding case_when condition. Now (thx to @cderv) I know that RHS is executed always although LHS condition is not met, thus RHS causes error when own() function is executed in case in which it is not supposed to be run.

@hughparsonage This is indeed the solution I am looking for. I am not sure if usage of if() else() within dplyr mutate is "tidyverse-way" of doing this?

grrrck · February 26, 2019, 7:56pm

In this case, I would recommend refactoring the step that needs to be controlled into a separate function. That way, your logic around controlling execution is clearly separated and the pipe chain can still be a more or less linear flow of steps.

I usually do something similar to the following when I run into this sort of setup.

library(tidyverse)

# Write a tidy, pipe-compliant function that 
# takes and returns a `tibble` or `data.frame`
choose_forking_path <- function(df, condition)  {
  if (condition) {
    mutate(df, x = "own")
  } else {
    mutate(df, x = NA_real_)
  }
}

Then, if an external_condition (or function argument, etc) is set, this function handles the logic around the specifics of the call to mutate().

external_condition <- FALSE

tibble(a = 1:4) %>% 
  group_by(a) %>% 
  choose_forking_path(external_condition)
#> # A tibble: 4 x 2
#> # Groups:   a [4]
#>       a     x
#>   <int> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     3    NA
#> 4     4    NA

external_condition <- TRUE

tibble(a = 1:4) %>% 
  group_by(a) %>% 
  choose_forking_path(external_condition)
#> # A tibble: 4 x 2
#> # Groups:   a [4]
#>       a x    
#>   <int> <chr>
#> 1     1 own  
#> 2     2 own  
#> 3     3 own  
#> 4     4 own

For completeness, you can even embed the if … else into your pipe chain, but I think this style gets messy and difficult to follow very quickly.

tibble(a = 1:4) %>% 
  group_by(a) %>% 
  {
    if (external_condition) {
      mutate(., x = "own")
    } else {
      mtuate(., x = NA_real_)
    }
  }
#> # A tibble: 4 x 2
#> # Groups:   a [4]
#>       a x    
#>   <int> <chr>
#> 1     1 own  
#> 2     2 own  
#> 3     3 own  
#> 4     4 own

^{Created on 2019-02-26 by the reprex package (v0.2.1)}

nwerth · February 26, 2019, 7:59pm

You could wrap it in a function and use lapply() or vapply():

library(dplyr)

even_odd <- function(n) {
  if (n %% 2 == 0) {
    "even"
  } else {
    "odd"
  }
}

data_frame(a = 1:4) %>%
  mutate(x = vapply(X = a, FUN = even_odd, FUN.VALUE = character(1)))
# # A tibble: 4 x 2
#       a x    
#   <int> <chr>
# 1     1 odd  
# 2     2 even 
# 3     3 odd  
# 4     4 even

system · March 5, 2019, 7:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.