Mutate and replace strings to new column

Glad you worked out a solution! Here are a few alternative ideas that might be a bit more streamlined. When recoding variables like this, I personally strongly favor maximizing readability and future maintainability — I don't want it to be a mystery to future-me (or anybody else) where and how the data coding decisions are made.

Set up test data frame

library(tidyverse)

mrgb_trus <- data.frame(
  MRGB_gleason = c("3+4", "4", "3+4", "4+4", "3+3",NA, "3+4", "3+3", NA, "4+3", 
    "3+3", "3+4", "3+4", NA, "3", "3+4", NA, NA, NA, NA, "4+3", "3+4", "3+3", 
    "4+3", "4+4", "4+5", "3+3", "4+3", "4+3", NA, NA, "3+3", "4+4", "3+4", "4+5", 
    "3+3", "5+4", NA, NA, "3+4", "4+3", NA, "3+3", "4+3", "3+4", "3+4", "3+4", NA, 
    "4+4", "4+3", "3+4", "3+4"), 
  stringsAsFactors = FALSE)

Option 1: case_when()

mrgb_trus_case_when <- mrgb_trus %>% 
  mutate(
    MRGGG = case_when(
      is.na(MRGB_gleason) ~ "0",
      MRGB_gleason == "3" ~ "1",
      MRGB_gleason == "4" ~ "1",
      MRGB_gleason == "3+3" ~ "1",
      MRGB_gleason == "3+4" ~ "2",
      MRGB_gleason == "4+3" ~ "3",
      MRGB_gleason == "4+4" ~ "4",
      MRGB_gleason == "4+5" ~ "5",
      MRGB_gleason == "5+4" ~ "5",
      MRGB_gleason == "5+5" ~ "5"
    )
  )

Option 2: Join with a lookup table

To maximize maintainability, you could store your lookup table as a CSV (reading it in as needed). That way nobody has to go digging around inside the code to add translations, and the CSV itself can be stored along with other project metadata.

mrgb_lookup <- tribble(
  ~ gleas_score, ~ gleas_grd_grp,
    NA,            "0",
    "3",           "1",
    "4",           "1",
    "3+3",         "1",
    "3+4",         "2",
    "4+3",         "3",
    "4+4",         "4",
    "4+5",         "5",
    "5+4",         "5",
    "5+5",         "5"
)

mrgb_trus_inner_join <- mrgb_trus %>% 
  inner_join(mrgb_lookup, by = c("MRGB_gleason" = "gleas_score")) %>% 
  rename("MRGGG" = "gleas_grd_grp")   # new col will bring along name from lookup table

Both of these methods produce the same results as your solution:

mrgb_trus_3step <- mrgb_trus %>% 
  mutate(
    MRGGG = str_replace_all(
      MRGB_gleason, 
      c("3\\+3" = "1", "3\\+4" = "2", 
        "4\\+3" = "3", "4\\+4" = "4", 
        "4\\+5" = "5", "5\\+4" = "5", 
        "5\\+5" = "5")
    ),
    MRGGG = replace(MRGGG, is.na(MRGGG), 0),
    MRGGG = replace(MRGGG, MRGB_gleason == "3" | MRGB_gleason == "4", "1")
  )

identical(
  mrgb_trus_3step$MRGGG, 
  mrgb_trus_case_when$MRGGG
)
#> [1] TRUE

identical(
  mrgb_trus_3step$MRGGG, 
  mrgb_trus_inner_join$MRGGG
)
#> [1] TRUE

Notes:

  • As seen above, you can put all your mutate steps in a single call to mutate() — it applies the changes sequentially, so later steps in a single call get the updated values from earlier steps.
  • Eventually, you probably want to convert your Gleason Grade Group values into an ordered factor
  • You might be interested in the questionr package. It has some really neat interactive RStudio add-ins that help you build variable recoding code — see the vignette here: https://juba.github.io/questionr/articles/recoding_addins.html
3 Likes