Recoding factors using if_else

forcats

#1

Is there a clever way to recode a factor based on another variable (so using if_else and adding new factor levels)?
My current work-around is changing it to a character but this doesn't retain the order of the factor levels:

library(tidyverse)

mydata = data_frame(country   = c("UK",     "Canada",        "Maldives",   "Seychelles"),
                    continent = c("Europe", "North America", "Seven seas", "Seven seas"),
                    myorder   = c(2,        1,                3,            4)) %>% 
  mutate(continent = fct_reorder(continent, myorder))

mydata$continent
#> [1] Europe        North America Seven seas    Seven seas   
#> Levels: North America Europe Seven seas

mydata %>% 
  mutate(mycontinent = continent) %>% 
  mutate(mycontinent = if_else(country == "Maldives",   "Africa", mycontinent)) %>% 
  mutate(mycontinent = if_else(country == "Seychelles", "Asia",   mycontinent))
#> Error in mutate_impl(.data, dots): Evaluation error: `false` must be type character, not integer.


newdata = mydata %>% 
  mutate(mycontinent = as.character(continent)) %>% 
  mutate(mycontinent = if_else(country == "Maldives",   "Africa", mycontinent)) %>% 
  mutate(mycontinent = if_else(country == "Seychelles", "Asia",   mycontinent))

newdata
#> # A tibble: 4 x 4
#>   country    continent     myorder mycontinent  
#>   <chr>      <fct>           <dbl> <chr>        
#> 1 UK         Europe              2 Europe       
#> 2 Canada     North America       1 North America
#> 3 Maldives   Seven seas          3 Africa       
#> 4 Seychelles Seven seas          4 Asia

newdata$mycontinent
#> [1] "Europe"        "North America" "Africa"        "Asia"

If I now have to turn mycontinent back into a factor.
(Which is this example is easy enough to do as I had the non-alphabetic order recorded in a variable, but sometime's I've used fct_relevel in a previous code chunk and would then have to copy/move that past this part now).

Created on 2018-09-10 by the reprex
package
(v0.2.0).


#2

If you know all possible levels for the factor, you can provide them when first creating it.


#3

Maybe I'm not understanding the problem clearly, but could you use forcats::fct_relabel(), moving your assignments inside the function passed to fct_relabel()?


#4

Thanks! But I tried fct_expand(), didn't help:

test = mydata %>%
  mutate(mycontinent = fct_expand(continent, "Africa", "Asia"))

test$mycontinent
#> [1] Europe        North America Seven seas    Seven seas   
#> Levels: North America Europe Seven seas Africa Asia

test %>%
  mutate(mycontinent = if_else(country == "Maldives",   factor("Africa"), mycontinent)) %>%
  mutate(mycontinent = if_else(country == "Seychelles", factor("Asia"),   mycontinent))
#> Warning in `[<-.factor`(`*tmp*`, i, value = structure(c(2L, 1L, 3L), .Label
#> = c("North America", : invalid factor level, NA generated

#> Warning in `[<-.factor`(`*tmp*`, i, value = structure(c(2L, 1L, 3L), .Label
#> = c("North America", : invalid factor level, NA generated
#> # A tibble: 4 x 4
#>   country    continent     myorder mycontinent
#>   <chr>      <fct>           <dbl> <fct>      
#> 1 UK         Europe              2 <NA>       
#> 2 Canada     North America       1 <NA>       
#> 3 Maldives   Seven seas          3 <NA>       
#> 4 Seychelles Seven seas          4 Asia

#5

Hmm, that sounds promising and I've not come across fct_relabel() before, but I don't think I can pass it more than one character vector? My issue here is I'm recoding a factor based on another column.

Tried this:

my_rename = function(x, as_is){
  if_else(x == "Maldives", "Africa", as_is)
}

mydata %>% 
  mutate(mycontinent = fct_relabel(continent, my_rename(country, continent)))
#> Error in mutate_impl(.data, dots): Evaluation error: `false` must be type character, not integer.

#6

When combining factors, it's best to make sure they have the same levels beforehand. The forcats package can handle mismatched levels, but not inside other functions like if_else(). Because if_else bases its return value on the true vector,

if_else(country == "Maldives",   factor("Africa"), mycontinent)

returns a factor vector based on factor("Africa"). This means its only level is "Africa". All the other continents in mycontinent don't match that single level, so they become NA. The same thing happens with the Asia line, converting all not-"Asia" values to NA. You could "solve" this by inverting the comparison:

mutate(mycontinent = if_else(country != "Maldives",   mycontinent, factor("Africa")))

But that's still not the shortest and most explanatory code for the task. I suggest predefining all possible levels and then using the replace() function.

all_continents <- c(
  "Europe", "North America", "South America", "Africa", "Australia", "Asia",
  "Antarctica", "Seven seas"
)

data_frame(country   = c("UK",     "Canada",        "Maldives",   "Seychelles"),
           continent = c("Europe", "North America", "Seven seas", "Seven seas"),
           myorder   = c(2,        1,                3,            4)) %>%
  mutate(mycontinent = factor(continent, all_continents)) %>%
  mutate(mycontinent = fct_reorder(mycontinent, myorder)) %>%
  mutate(mycontinent = replace(mycontinent, country == "Maldives", "Africa")) %>%
  mutate(mycontinent = replace(mycontinent, country == "Seychelles", "Asia"))
# # A tibble: 4 x 3
#   country    continent     myorder
#   <chr>      <fct>           <dbl>
# 1 UK         Europe              2
# 2 Canada     North America       1
# 3 Maldives   Africa              3
# 4 Seychelles Asia                4

#7

Super, thanks for the further explanation and example. That works as expected, and I've managed to add my fct_expand() in too :slight_smile:

mydata %>% 
  mutate(mycontinent = continent %>% fct_expand("Africa", "Asia")) %>% 
  mutate(mycontinent = replace(mycontinent, country == "Maldives", "Africa")) %>%
  mutate(mycontinent = replace(mycontinent, country == "Seychelles", "Asia"))