invalid factor level, NA generated

Hello,

I'm trying to create a new variable 'value' in the data frame, and assign either the value of its corresponding 'src' or 'dst' if 'group' is either 'A' or 'B'. But I get 'NA' in 'B' instead of 'C1' ('dst') with a warning of ': invalid factor level, NA generated'

Could anyone help me to understand what the issue is with the factors here?
What am I doing wrong and what is the right way to do it?

Many thanks in advance,

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.6.2

dF <- data.frame(
  group = c("A","A","B"),
  src = c("O1","O2", "O3"),
  dst = c("C1","C2","C1")
)

dF <- dF %>%
  mutate(
    value = group == "A",
    value = if_else(value == TRUE, src, dst)
  )
#> Warning in `[<-.factor`(`*tmp*`, i, value = structure(1L, .Label =
#> c("C1", : invalid factor level, NA generated

dF
#>   group src dst value
#> 1     A  O1  C1    O1
#> 2     A  O2  C2    O2
#> 3     B  O3  C1  <NA>

Created on 2021-07-14 by the reprex package (v2.0.0)

This is related to the conversion of strings to factors in your data.frame.
And I think this is dependent on your version of R.
See the two cases in the next reprex. My default for stringsAsFactors seems to be FALSE .

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dF1 <- data.frame(
  group = c("A","A","B"),
  src = c("O1","O2", "O3"),
  dst = c("C1","C2","C1"),
  stringsAsFactors = T
)

dF1 <- dF1 %>%
  mutate(
    value = group == "A",
    value = if_else(value == TRUE, src, dst)
  )
#> Warning in `[<-.factor`(`*tmp*`, i, value = structure(1L, .Label = c("C1", :
#> invalid factor level, NA generated

dF1
#>   group src dst value
#> 1     A  O1  C1    O1
#> 2     A  O2  C2    O2
#> 3     B  O3  C1  <NA>

dF2 <- data.frame(
  group = c("A","A","B"),
  src = c("O1","O2", "O3"),
  dst = c("C1","C2","C1"),
  stringsAsFactors = F
)

dF2 <- dF2 %>%
  mutate(
    value = group == "A",
    value = if_else(value == TRUE, src, dst)
  )

dF2
#>   group src dst value
#> 1     A  O1  C1    O1
#> 2     A  O2  C2    O2
#> 3     B  O3  C1    C1
Created on 2021-07-15 by the reprex package (v2.0.0)

 Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Europe/Berlin               
#>  date     2021-07-15                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  backports     1.1.6   2020-04-05 [1] CRAN (R 4.0.0)
#>  cli           2.4.0   2021-04-05 [1] CRAN (R 4.0.3)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.5)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.3)
#>  dplyr       * 1.0.7   2021-06-18 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  generics      0.0.2   2018-11-29 [1] CRAN (R 4.0.0)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.2)
#>  knitr         1.33    2021-04-24 [1] CRAN (R 4.0.5)
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.4)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.3)
#>  pillar        1.6.0   2021-04-13 [1] CRAN (R 4.0.5)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.2)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.0.5)
#>  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.0.5)
#>  rmarkdown     2.9     2021-06-15 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.3)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  stringi       1.6.2   2021-05-17 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  styler        1.4.1   2021-03-30 [1] CRAN (R 4.1.0)
#>  tibble        3.1.1   2021-04-18 [1] CRAN (R 4.0.5)
#>  tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.0.5)
#>  utf8          1.1.4   2018-05-24 [1] CRAN (R 4.0.0)
#>  vctrs         0.3.6   2020-12-17 [1] CRAN (R 4.0.3)
#>  withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.3)
#>  xfun          0.24    2021-06-15 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] D:/tools/R/Packages
#> [2] D:/tools/R/R-4.1.0/library
1 Like

Many thanks @HanOostdijk for your help. You're right, my issue was easy solved by just changing the factors to strings. :blush:

Have a great day!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.