How to modify a tibble value in a factor column?

Hi,

Suppose I have the following tibble:

library(tidyverse)
tiny <- 
  tibble(a = 1, b = factor(1, levels = 1:2)) %>% 
  add_row(a = 2, b = NA) 
tiny
#> # A tibble: 2 x 2
#>       a b    
#>   <dbl> <fct>
#> 1     1 1    
#> 2     2 <NA>

and I want to replace the NA value in column b by 2.

First approach:

tiny %>% 
  mutate(b = case_when(is.na(b) ~ 2, TRUE ~ b))
#> Error: must be a double vector, not a `factor` object

Second approach:

tiny %>% 
  mutate(
    b = 
      case_when(
        is.na(b) ~ factor(2, levels = levels(b)), 
        TRUE ~ b
      )
  )
#> # A tibble: 2 x 2
#>       a b    
#>   <dbl> <fct>
#> 1     1 1    
#> 2     2 2

Created on 2020-07-04 by the reprex package (v0.3.0)
Is there a less verbose alternative to the second approach that resembles the first?

mutate(tiny,b = fct_explicit_na(b, "2"))
1 Like

Thanks, @nirgrahamuk, it seems I gave too tiny an example to illustrate what I meant :slight_smile: : What if I'd like to replace NA's by different values in different rows?

``` r
library(tidyverse)
tiny <- 
  tibble(a = 1, b = factor(1, levels = 1:2)) %>% 
  add_row(a = 2, b = NA) %>% 
  add_row(a = 3, b = NA) 
tiny
#> # A tibble: 3 x 2
#>       a b    
#>   <dbl> <fct>
#> 1     1 1    
#> 2     2 <NA> 
#> 3     3 <NA>

Created on 2020-07-04 by the reprex package (v0.3.0)
What I'm trying to get at is how to modify individual values in a factor column, whether or not they're NA, and I can only seem to do it the verbose way I showed above.

I don't really understand the requirement.
Is there some rule or pattern to base the modification on ?
Its probably more informative if you present a "real world" type context, because we can come up with so many solutions for toy problems. i.e.
you could just throw b away and make it out of a .. or use row_number and turn that to a factor etc etc.
If you have a completely arbitrary mapping, where for each a you need a specific b, you would probably just do that by a join...

I guess the real world scenario I have in mind is similar to correcting data-entry errors that are not systematic, so need to be handled 'manually' in an ad hoc way.

If the column where not a factor column, the first approach I used gives exactly what you want: replace this entry in the factor column by this value. However, once the column is a factor column, the tidyverse constraints imposed on tibbles seem to prevent any simple replacement.

ah ok. The restriction is only that case_when monitors that it applies the same data types so if the result is to be a factor then it should only be assigning factors, but thats what you had yourself already...

tiny %>% 
  mutate(b = case_when(is.na(b) ~ factor("2",levels(b)), TRUE ~ b))

base ifelse function will find compatible types so will coerce to character, then you can manually repush to factor

tiny %>% 
  mutate(b = as.factor(ifelse(is.na(b) , "2", b)))
1 Like

Thanks, @nirgrahamuk: So it seems a simple replacement alternative is not available for factor columns. In my real-world case, the column is actually an ordered factor column, so I need use

tiny %>% 
  mutate(b = case_when(is.na(b) ~ factor("2",levels(b), ordered = TRUE), TRUE ~ b))

which seems so tedious by comparison to a simple replacement!

hmmm, maybe its less tedious if you have a seperate variable with the 'pure' representation of the ordered factor and its levels, then you pick from that to fill gaps as you need. something like

library(tidyverse)
tiny <- 
  tibble(a = 1, b = factor(1, levels = 1:2,ordered=TRUE)) %>% 
  add_row(a = 2, b = NA) 

bfac <- factor(c("1","2"),ordered = TRUE)

tiny %>% 
  mutate(b = case_when(is.na(b) ~ bfac[[2]], TRUE ~ b))

not that tiny had to have the ordered=TRUE in there to maintain the types compatability

Nice :slight_smile:: I'd guess that's the closest we can get without asking to have the tibble data type compatibility constraints changed to allow simple replacement. (And did you mean to write 'note' here instead of 'not'?)

indeed, I did, yes !

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.