Replacing old values by new with mutate

Hi,
I'm trying to replace missing or incorrect values in one string variable (ModelLong) if specific conditions are met

Now I need to fix or replace "ModelLong" existing values by new using following rules:

  1. I need to replace all "aa"s by "aa (new)" or "aa (old)" depending on "Year"
  2. I need to replace all "cc"s by relevant full name depending on "VINChar"
  3. I need to replace all blank "ModelLong" by relevant full name depending on "VINChar"

I have prepared the code below but I have errors I cannot fix.

library(dplyr)

Sales.data <- data.frame(stringsAsFactors = FALSE,
        Year = c(2020, 2011, 2020, 2020, 2020, 2019, 2020, 2020, 2020, 2019,
                 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2008, 2019, 2020,
                 2020, 2020),
         URN = c("aaa", "ccc", "a1", "b1", "ccc1", "b11", "aaa3",
                           "b3", "b11", "c11", "bb3", "aa31", "cc31", "bb11",
                           "ccc1", "aaf", "bb", "cf", "aaa1", "bb1", "aaf2",
                           "bf2"),
   ModelLong = c("cc", "aa", "aa(2014~)", "ccN", "bb(2014~)", NA, NA,
                           "CC", "cc", "bb(2014~)", "cc", "aa", "bb(2014~)",
                           NA, "cc", "ccN", "cc", "aa", "bb(2014~)", "bb(2014~)",
                           "ccN", "aa(2014~)"),
     VINChar = c("H3", "A8", "A7", "HC", "B3", "h2", "A8", "H3",
                           "H2", "B3", "HC", "a8", "B3", "B3", "D1", "HC",
                           "h3", "A8", "B3", "B3", "HC", "A8")
)

Sales.data <- Sales.data %>% 
  mutate_if(ModelLong="aa") # using Year to replace 'aa' by 'aa (new)' or 'aa (old)'
mutate(ModelLong = case_when(
  Year >= 2014 ~ "aa (new)",
  Year < 2014 ~ "aaa (old)"
)) %>%
  mutate_if(ModelLong="cc") # using VINChar to replace 'cc' by correct ModelLong code
mutate(ModelLong = case_when(
         VINChar == 'D1' ~ "cc (2012-2015)",
         VINChar == 'H2' ~ "cc (2016 ~ )",
         VINChar == 'H3' ~ "cc (2016 ~ )"
       )) %>%
  mutate_if(is.na(x = ModelLong)) # using VINChar to create ModelLong code for records with ModelLong code missing
mutate(ModelLong = case_when(
  VINChar == 'D1' ~ "cc (2012-2015)",
  VINChar == 'H2' ~ "cc (2016 ~ )",
  VINChar == 'H3' ~ "cc (2016 ~ )",
  VINChar == 'B3' ~ "ddd",
  VINChar == 'A8' ~ "aaa"))

Also, all statements above cannot be case sensitive. Can you help please?

Hi @Slavek

First you forgot some %>% at the end of your mutate_if.
Second, you didn't use well mutate_if. In mutate_if the condition is not in the value of a column but it is a condition for a characteristic of whole colmuns (eg : mutate_if (data, is.numeric, as_factor)).

I don't understand well what u want to do with the mutate_if, but I think u can put it in your case_when. Something like that .

(ModelLong = case_when( ModelLong == "aa" and Year >= 2014 ~ "aa (new), ..."

Hope this help

I thought mutate_if is equivalent of "do if" so rather than repeating "...ModelLong == "aa" …" in each line I thought we could select specific records mutate relates to so:

  1. If model is "aa" do this mutation:
    mutate(ModelLong = case_when(
    Year >= 2014 ~ "aa (new)",
    Year < 2014 ~ "aaa (old)")

  2. If model is "cc" do this mutation:
    mutate(ModelLong = case_when(
    VINChar == 'D1' ~ "cc (2012-2015)",
    VINChar == 'H2' ~ "cc (2016 ~ )",
    VINChar == 'H3' ~ "cc (2016 ~ )")

  3. If ModelLong information is missing do this:
    mutate(ModelLong = case_when(
    VINChar == 'D1' ~ "cc (2012-2015)",
    VINChar == 'H2' ~ "cc (2016 ~ )",
    VINChar == 'H3' ~ "cc (2016 ~ )",
    VINChar == 'B3' ~ "ddd",
    VINChar == 'A8' ~ "aaa"))

Is it doable?

It is doable, but not through the base dplyr functions. You can create a mutate_when function (see this gist and the code below) that will accomplish what you want.

Also, I've used grepl since it allows the conditions to be case-insensitive and you have more options for pattern matching since it uses RegEx.

library(dplyr)

# See https://gist.github.com/romainfrancois/eeeed972d6734bcad3ec3dcf872df7ea
mutate_when <- function(data, condition, ...){
  condition <- rlang::enquo(condition)
  
  dots <- rlang::exprs(...)
  
  expressions <- purrr::map2( dots, syms(names(dots)), ~{
    quo( case_when(..condition.. ~ !!.x , TRUE ~ !!.y ) )
  })
  
  data %>%
    mutate( ..condition.. = !!condition ) %>%
    mutate( !!!expressions ) %>%
    select( -..condition..)
}

Sales.data <- data.frame(stringsAsFactors = FALSE,
                         Year = c(2020, 2011, 2020, 2020, 2020, 2019, 2020, 2020, 2020, 2019,
                                  2019, 2020, 2020, 2020, 2020, 2020, 2020, 2008, 2019, 2020,
                                  2020, 2020),
                         URN = c("aaa", "ccc", "a1", "b1", "ccc1", "b11", "aaa3",
                                 "b3", "b11", "c11", "bb3", "aa31", "cc31", "bb11",
                                 "ccc1", "aaf", "bb", "cf", "aaa1", "bb1", "aaf2",
                                 "bf2"),
                         ModelLong = c("cc", "aa", "aa(2014~)", "ccN", "bb(2014~)", NA, NA,
                                       "CC", "cc", "bb(2014~)", "cc", "aa", "bb(2014~)",
                                       NA, "cc", "ccN", "cc", "aa", "bb(2014~)", "bb(2014~)",
                                       "ccN", "aa(2014~)"),
                         VINChar = c("H3", "A8", "A7", "HC", "B3", "h2", "A8", "H3",
                                     "H2", "B3", "HC", "a8", "B3", "B3", "D1", "HC",
                                     "h3", "A8", "B3", "B3", "HC", "A8")
)

Sales.data %>%
  mutate_when(grepl(x = ModelLong, pattern = "aa", ignore.case = TRUE),
              ModelLong = ifelse(Year >= 2014, "aa (new)", "aa (old)")) %>% 
  mutate_when(grepl(x = ModelLong, pattern = "cc", ignore.case = TRUE),
    ModelLong = case_when(
      grepl(x = VINChar, pattern = 'd1', ignore.case = TRUE) ~ "cc (2012-2015)",
      grepl(x = VINChar, pattern = 'h2|h3', ignore.case = TRUE) ~ "cc (2016 ~ )"
    )
  ) %>%  
  mutate_when(is.na(ModelLong),
    ModelLong = case_when(
      grepl(x = VINChar, pattern = 'd1', ignore.case = TRUE) ~ "cc (2012-2015)",
      grepl(x = VINChar, pattern = 'h2|h3', ignore.case = TRUE) ~ "cc (2016 ~ )",
      grepl(x = VINChar, pattern = 'b3', ignore.case = TRUE) ~ "ddd",
      grepl(x = VINChar, pattern = 'a8', ignore.case = TRUE) ~ "aaa"
    )
  ) %>% 
  head()
#>   Year  URN    ModelLong VINChar
#> 1 2020  aaa cc (2016 ~ )      H3
#> 2 2011  ccc     aa (old)      A8
#> 3 2020   a1     aa (new)      A7
#> 4 2020   b1         <NA>      HC
#> 5 2020 ccc1    bb(2014~)      B3
#> 6 2019  b11 cc (2016 ~ )      h2

Created on 2019-10-15 by the reprex package (v0.2.1)

Session info ``` r devtools::session_info() #> Warning in system("timedatectl", intern = TRUE): running command #> 'timedatectl' had status 1 #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.5.3 (2019-03-11) #> os Ubuntu 16.04.6 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate C.UTF-8 #> ctype C.UTF-8 #> tz Etc/UTC #> date 2019-10-15 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] RSPM (R 3.5.2) #> backports 1.1.4 2019-04-10 [1] RSPM (R 3.5.2) #> callr 3.1.1 2018-12-21 [1] RSPM (R 3.5.0) #> cli 1.1.0 2019-03-19 [1] RSPM (R 3.5.2) #> crayon 1.3.4 2017-09-16 [1] RSPM (R 3.5.0) #> desc 1.2.0 2018-05-01 [1] RSPM (R 3.5.2) #> devtools 2.0.1 2018-10-26 [1] RSPM (R 3.5.2) #> digest 0.6.21 2019-09-20 [1] RSPM (R 3.5.3) #> dplyr * 0.8.3 2019-07-04 [1] RSPM (R 3.5.2) #> evaluate 0.14 2019-05-28 [1] RSPM (R 3.5.2) #> fs 1.2.6 2018-08-23 [1] RSPM (R 3.5.0) #> glue 1.3.1 2019-03-12 [1] RSPM (R 3.5.2) #> highr 0.8 2019-03-20 [1] RSPM (R 3.5.2) #> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.5.3) #> knitr 1.23 2019-05-18 [1] RSPM (R 3.5.2) #> magrittr 1.5 2014-11-22 [1] RSPM (R 3.5.0) #> memoise 1.1.0 2017-04-21 [1] RSPM (R 3.5.2) #> pillar 1.4.2 2019-06-29 [1] RSPM (R 3.5.2) #> pkgbuild 1.0.2 2018-10-16 [1] RSPM (R 3.5.2) #> pkgconfig 2.0.2 2018-08-16 [1] RSPM (R 3.5.2) #> pkgload 1.0.2 2018-10-29 [1] RSPM (R 3.5.2) #> prettyunits 1.0.2 2015-07-13 [1] RSPM (R 3.5.2) #> processx 3.3.0 2019-03-10 [1] RSPM (R 3.5.2) #> ps 1.3.0 2018-12-21 [1] RSPM (R 3.5.0) #> purrr 0.3.2 2019-03-15 [1] RSPM (R 3.5.2) #> R6 2.4.0 2019-02-14 [1] RSPM (R 3.5.2) #> Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.5.2) #> remotes 2.0.2 2018-10-30 [1] RSPM (R 3.5.2) #> rlang 0.4.0 2019-06-25 [1] RSPM (R 3.5.2) #> rmarkdown 1.14 2019-07-12 [1] RSPM (R 3.5.2) #> rprojroot 1.3-2 2018-01-03 [1] RSPM (R 3.5.2) #> sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 3.5.2) #> stringi 1.4.3 2019-03-12 [1] RSPM (R 3.5.2) #> stringr 1.4.0 2019-02-10 [1] RSPM (R 3.5.2) #> testthat 2.0.1 2018-10-13 [1] RSPM (R 3.5.0) #> tibble 2.1.3 2019-06-06 [1] RSPM (R 3.5.2) #> tidyselect 0.2.5 2018-10-11 [1] RSPM (R 3.5.2) #> usethis 1.4.0 2018-08-14 [1] RSPM (R 3.5.2) #> withr 2.1.2 2018-03-15 [1] RSPM (R 3.5.0) #> xfun 0.8 2019-06-25 [1] RSPM (R 3.5.2) #> yaml 2.2.0 2018-07-25 [1] RSPM (R 3.5.0) #> #> [1] /home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.5 #> [2] /opt/R/3.5.3/lib/R/library ```
1 Like

Thank you.
I followed your code but this error appeared:

Error in mutate_when(., grepl(x = ModelLong, pattern = "aa", ignore.case = TRUE),  : 
  could not find function "mutate_when"

I've added these:

library(purrr)
library(rlang)

but still the same problem.
I've checked these:

> packageVersion("purrr")
[1] ‘0.3.2’
> packageVersion("dplyr")
[1] ‘0.8.3’
> packageVersion("rlang")
[1] ‘0.4.0’

I also think about simplifying the code as the second and the third mutate_when could be combined into something like:

mutate_when(
(is.na(ModelLong) or 
(grepl(x = ModelLong, pattern = "cc", ignore.case = TRUE)
),

Can you help please?

mutate_when is not define in these libraries. U must define the fonction mutate_when like Jdb say it. See the doc on jdb comment.

Thank you very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.