case_when gives me different results using ! than x!=NA

I have a problem with dplyr and case_when using !
If I run this

data<-data %>% 
    mutate(tax=case_when(z13!=NA ~0
                            ,zone==1 & z13 %in% 2:5 ~100
                            ,zone==2 & z13 %in% 3:5 ~100))

I transform the data correctly. However, when I tried this

data<-data %>% 
                            ,zone==1 & z13 %in% 2:5 ~100
                            ,zone==2 & z13 %in% 3:5 ~100))

Everything was applied as zero. I mean, all values were zeros on "tax".
The data on "z13" is integer type. I can run the code, but I think the correct way is always use or !
Can you guide me what am I doing wrong on the second code chunk?
Thanks for your help and time.
Have a nice weekend.

This is probably not the behavior you expected!

x <- NA
y <- 64

x == NA
#> [1] NA
y == NA
#> [1] NA

x != NA
#> [1] NA
y != NA
#> [1] NA

Created on 2022-06-16 by the reprex package (v2.0.1)

For the first, ! is NA for all values of z13, so not TRUE and gets passed to the next criteria.

For the second, if z13 is not NA, so ! is TRUE, why do you want to assign a value of zero?


I posted about some time ago.
I remember that the solution was using an anti join.

you should consider your heirarchy when your case_when possibilities are overlapping..
not being NA overlaps with z13 being between 2 and 5. Therefore it gets assigned first.
change the order for different results, if you need not being NA to be a fallback if the other two options dont match, make it the 3rd option rather than the 1st. In case_when you often will apply a fallback if your earlier options do not exhaust all possibilities, this can be done by using the logical symbol TRUE as what to match on.

d2 <- data %>% 
    mutate(tax=case_when(    zone==1 & z13 %in% 2:5 ~100
                            ,zone==2 & z13 %in% 3:5 ~100
                            ,TRUE ~ -99999999))
1 Like

Thanks, nirgrahamuk.
The issue is I am translating some STATA code into tidyverse. An official syntax from a survey.
The issue arises when I deal with the management of missing values. Stata is very different than R working NA with. It's usual to generate variables in Stata as some value only if there is non missing values in a specific column. Translating that in R always gives me problems.
Thanks again, nirgrahamuk.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.