Case_when function for recoding variables

anaisvb · November 13, 2020, 5:22pm

Hi,

I am pretty new in R. I am recoding a variable in R with the case_when function, changing only some of the values according to a set of rules, but when I print the new variable, the values I didn't include in the case_when function also change.

The variable I want to recode is a mixed variable that has character and numeric values. I am using the case_when function to transform the character values into numeric. The case is that when I print the new variable (expecting to get the old numeric values plus the new numeric values), the old numeric values also have changed.

Below there is the code I'm using..

pobgit_p %>% 
  mutate(P57_2_num = case_when(
    P57_2 == "No" ~ 0,
    P57_2 == "NS" ~ 0,
    P57_2 == "NC" ~ 0,
  ))

I don't get what I am doing wrong.

Many thanks in advance

joels · November 13, 2020, 5:51pm

Here are a few examples that will hopefully clarify what's going wrong. Note that I've used the %in% function to reduce the number of lines of code needed. Also, add TRUE ~ P57_2 in order to ensure non-recoded values keep their original values (rather than being set to missing).

library(tidyverse)

# Fake data
d = tibble(x=c(5, 10, "No", 1, "Yes", "Y", "NS", "NC"))

# The recoded values are set to zero, but all the other values are set 
# to missing
d %>% 
  mutate(x_recode = case_when(x %in% c("No","NS","NC") ~ 0))
#> # A tibble: 8 x 2
#>   x     x_recode
#>   <chr>    <dbl>
#> 1 5           NA
#> 2 10          NA
#> 3 No           0
#> 4 1           NA
#> 5 Yes         NA
#> 6 Y           NA
#> 7 NS           0
#> 8 NC           0

# This errrors out because `case_when` expects all values to have the 
# same class (character, numeric, etc.) but here we tried to recode some 
# values to numeric while leaving some as their original character values
d %>% 
  mutate(x_recode = case_when(x %in% c("No","NS","NC") ~ 0,
                              TRUE ~ x))
#> Error: Problem with `mutate()` input `x_recode`.
#> x must be a double vector, not a character vector.
#> ℹ Input `x_recode` is `case_when(x %in% c("No", "NS", "NC") ~ 0, TRUE ~ x)`.

# This works because we assigned "0",  which is character class, so all the 
# values are character
d %>% 
  mutate(x_recode = case_when(x %in% c("No","NS","NC") ~ "0",
                              TRUE ~ x))
#> # A tibble: 8 x 2
#>   x     x_recode
#>   <chr> <chr>   
#> 1 5     5       
#> 2 10    10      
#> 3 No    0       
#> 4 1     1       
#> 5 Yes   Yes     
#> 6 Y     Y       
#> 7 NS    0       
#> 8 NC    0

# Now we've recoded all the character values  to numbers, but we're still 
# keeping everything as character class
d %>% 
  mutate(x_recode = case_when(x %in% c("No","NS","NC") ~ "0",
                              x %in% c("Yes","Y") ~ "1",
                              TRUE ~ x))
#> # A tibble: 8 x 2
#>   x     x_recode
#>   <chr> <chr>   
#> 1 5     5       
#> 2 10    10      
#> 3 No    0       
#> 4 1     1       
#> 5 Yes   1       
#> 6 Y     1       
#> 7 NS    0       
#> 8 NC    0

# Finally, change the class to numeric, so that we end up with a column of 
# numeric values, rather than character values
d %>% 
  mutate(x_recode = case_when(x %in% c("No","NS","NC") ~ "0",
                              x %in% c("Yes","Y") ~ "1",
                              TRUE ~ x),
         x_recode = as.numeric(x_recode))
#> # A tibble: 8 x 2
#>   x     x_recode
#>   <chr>    <dbl>
#> 1 5            5
#> 2 10          10
#> 3 No           0
#> 4 1            1
#> 5 Yes          1
#> 6 Y            1
#> 7 NS           0
#> 8 NC           0

^{Created on 2020-11-13 by the reprex package (v0.3.0)}

anaisvb · November 14, 2020, 10:28am

Many thanks! I recoded the variable again (this time I'm trying to use reprex, I didn't do it in the first post, sorry!)

library(tidyverse)
pobgit_p <- pobgit[sample(nrow(pobgit), 10), ] #This is the dataframe I am using as a test, just to practice
#> Error in eval(expr, envir, enclos): object 'pobgit' not found
pobgit_p$P57_2_num <- as.character(pobgit_p$P57_2) # Transform the factor to character
#> Error in eval(expr, envir, enclos): object 'pobgit_p' not found
pobgit_p %>% mutate(P57_2_num = case_when(P57_2_num %in% c("No", "NS", "NC") ~ "0", 
                                    TRUE ~ P57_2_num)) # Recode the variable 
#> Error in eval(lhs, parent, parent): object 'pobgit_p' not found
pobgit_p$P57_2_num <- as.numeric(pobgit_p$P57_2_num) # Transform to numeric, but NAs are introduced by coertion
#> Error in eval(expr, envir, enclos): object 'pobgit_p' not found
pobgit_p$P57_2_num[is.na(pobgit_p$P57_2_num)] <- 0 # Convert the NA to 0
#> Error in pobgit_p$P57_2_num[is.na(pobgit_p$P57_2_num)] <- 0: object 'pobgit_p' not found
glimpse(pobgit_p$P57_2_num) 
#> Error in glimpse(pobgit_p$P57_2_num): object 'pobgit_p' not found

The first variable was a factor, so what I tried was to transform it first to character, then recoding it with the new values, and after that transforming it again to numeric. The thing here is that when I do the last step, NAs are introduced by coercion. In a very rudimentary way, I just changed them to O, but I am not sure if this is the more efficient way to do it. I have to reproduce this recoding in multiple variables, so I suspect I am doing things more complicated than they really are.

Super thanks again!

system · December 5, 2020, 10:28am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.