Unexpected ifelse outcome

Hi,

I have a tmpFlag logical outside of the tribble ttmp.
Could somebody explain why my first version of ifelse command gives a wrong output and the second version resolves it by adding rowwise()? The third version is correct as expected.
Thanks.

Ha

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.3
#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Warning: package 'tibble' was built under R version 4.1.3
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
#> Warning: package 'purrr' was built under R version 4.1.2
#> Warning: package 'dplyr' was built under R version 4.1.3
#> Warning: package 'stringr' was built under R version 4.1.2
#> Warning: package 'forcats' was built under R version 4.1.2
#tidyverse: version 1.3.1

tmpFlag = FALSE
ttmp = tibble(x=c(1:4, NA)) 

# Wrong answer
ttmp %>%
  mutate(y=ifelse(tmpFlag, NA, x))
#> # A tibble: 5 x 2
#>       x     y
#>   <int> <int>
#> 1     1     1
#> 2     2     1
#> 3     3     1
#> 4     4     1
#> 5    NA     1


# Correct answer by adding rowwise()

ttmp %>%
  rowwise() %>%
  mutate(y=ifelse(tmpFlag, NA, x))
#> # A tibble: 5 x 2
#> # Rowwise: 
#>       x     y
#>   <int> <int>
#> 1     1     1
#> 2     2     2
#> 3     3     3
#> 4     4     4
#> 5    NA    NA

# This has a correct output as expected

ttmp %>%
  mutate(Flag=tmpFlag,
    y=ifelse(Flag, NA, x))
#> # A tibble: 5 x 3
#>       x Flag      y
#>   <int> <lgl> <int>
#> 1     1 FALSE     1
#> 2     2 FALSE     2
#> 3     3 FALSE     3
#> 4     4 FALSE     4
#> 5    NA FALSE    NA

Created on 2023-03-14 by the reprex package (v2.0.1)

Hello!

The "seemingly" unexpected result that you obtain in the first code can be explained by 2 reasons:

A. when you create a new column in a data frame/tibble, if you only provide a single value to the new column, the same value will be automatically repeated enough times to populate the entire column.

For example, if you want to add a status column to the mtcars dataset, which contains the word "new", you can do it this way. Notice how we do not need to repeat "new" ourselves:

library(dplyr)

mtcars %>% mutate(status = "new") %>% head()

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb status
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    new
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    new
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1    new
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    new
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    new
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1    new

B. The ifelse() function is a vectorized function. This means that it is applied to all the elements of a vector. More specifically, it runs as many times as the number of conditions provided to it.

Here is your first code - the code which gives you the result you don't want:

ttmp %>%
  mutate(y=ifelse(tmpFlag, NA, x))

tmpFlag is a vector of length 1, it contains a single FALSE. This means that the ifelse function only runs once!. Since the value of the condition is FALSE, the function will return the first value of x, which is 1, and will stop running. This is where point A above will kick in - the value 1 will be repeated enough times to populate the entire y column.

In the 3rd code works as expected because the condition is Flag and has as many elements as the number of rows in ttmp.

Do not hesitate to ask any questions you may have :slight_smile:

2 Likes

Hi Gueyenono,

Thanks for your explanation.

Ha

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.