case_when gives me different results using !is.na(x) than x!=NA

Hello,
I have a problem with dplyr and case_when using !is.na.
If I run this


data<-data %>% 
    mutate(tax=case_when(z13!=NA ~0
                            ,zone==1 & z13 %in% 2:5 ~100
                            ,zone==2 & z13 %in% 3:5 ~100))

I transform the data correctly. However, when I tried this


data<-data %>% 
    mutate(tax=case_when(!is.na(z13)~0
                            ,zone==1 & z13 %in% 2:5 ~100
                            ,zone==2 & z13 %in% 3:5 ~100))

Everything was applied as zero. I mean, all values were zeros on "tax".
The data on "z13" is integer type. I can run the code, but I think the correct way is always use is.na or !is.na
Can you guide me what am I doing wrong on the second code chunk?
Thanks for your help and time.
Have a nice weekend.

This is probably not the behavior you expected!

x <- NA
y <- 64

x == NA
#> [1] NA
y == NA
#> [1] NA

x != NA
#> [1] NA
y != NA
#> [1] NA

Created on 2022-06-16 by the reprex package (v2.0.1)

For the first, !is.na(z13) is NA for all values of z13, so not TRUE and gets passed to the next criteria.

For the second, if z13 is not NA, so !is.na(z13) is TRUE, why do you want to assign a value of zero?

2 Likes

I posted about some time ago.
I remember that the solution was using an anti join.

you should consider your heirarchy when your case_when possibilities are overlapping..
not being NA overlaps with z13 being between 2 and 5. Therefore it gets assigned first.
change the order for different results, if you need not being NA to be a fallback if the other two options dont match, make it the 3rd option rather than the 1st. In case_when you often will apply a fallback if your earlier options do not exhaust all possibilities, this can be done by using the logical symbol TRUE as what to match on.

d2 <- data %>% 
    mutate(tax=case_when(    zone==1 & z13 %in% 2:5 ~100
                            ,zone==2 & z13 %in% 3:5 ~100
                            ,!is.na(z13)~0
                            ,TRUE ~ -99999999))
1 Like

Thanks, nirgrahamuk.
The issue is I am translating some STATA code into tidyverse. An official syntax from a survey.
The issue arises when I deal with the management of missing values. Stata is very different than R working NA with. It's usual to generate variables in Stata as some value only if there is non missing values in a specific column. Translating that in R always gives me problems.
Thanks again, nirgrahamuk.