Replace multiple values in a column to a new value

Hi,

I have a column in the data with about 375 different values. I want to replace those values into about 10 unique values such as (Urine, Blood, Wound, Skin, Respiratory....)
For example, change: 'Blood', 'Bld', 'Blood sample', 'Bld specimen' .....to "Blood".
'Ur', 'Urine sample', 'Urine', 'Urinary specimen', 'Urine Catheter' to ........"Urine"
'Wnd', 'Wound swab', 'Wound', 'Scar wound', 'wound' to ......"Wound"
All other values to...... "Other"
I've seen this code online:

df %>%
mutate(var1 = recode(var1, 'oldvalue1' = 'newvalue1', 'oldvalue2' = 'newvalue2'),
var2 = recode(var2, 'oldvalue1' = 'newvalue1', 'oldvalue2' = 'newvalue2'))

My question: Is there an easier way to do it because as I said there are about 375 values and it takes a lot of time?
Thank you

When you start needing to manage 100's of recodings, in my opinion, thats the time to remove the recoding from being typed out in pure code, to a code as data approach, where a simple dataframe of what the replacements are is defined. and you join your data to this and manipulate it with minimal code. the data of what the translations are should be separated from the simple code of replacement.

1 Like

A column in a data frame can be extracted as a vector and a vector can be modified with gsub(). I'd do it this way as a first stab

v <- sample(c("All","Bld","Bld","Blood","Blood","Blood","Blood","Catheter","change","example","For","other","Other","Respiratory","sample","sample","Scar","Skin","specimen","specimen","swab","Ur","Urinary","Urine","Urine","Urine","Urine","Urine","values","Wnd","Wound","Wound","Wound","wound","wound","Wound"))
tokens <- v |> tolower() |> unique() |> sort()
targets <- c("blood","skin","urine","wound")
(to_sift <- tokens[!(tokens %in% targets)])
#>  [1] "all"         "bld"         "catheter"    "change"      "example"    
#>  [6] "for"         "other"       "respiratory" "sample"      "scar"       
#> [11] "specimen"    "swab"        "ur"          "urinary"     "values"     
#> [16] "wnd"
# pick cases to be handled
blood <- to_sift[2]
urine1 <- to_sift[13]
urine2 <- to_sift[14]
wound <- to_sift[16]

d <- data.frame(v = tolower(v))

reform <- function(x){
  gsub(blood,"blood",x)
  gsub(urine1,"urine",x)
  gsub(urine2,"urine",x)
  gsub(wound,"wound",x)
}

to_process <- data.frame(stuff = reform(d$v))
ifelse(to_process$stuff %in% targets,to_process$stuff,"other")
#>  [1] "other" "other" "urine" "other" "blood" "wound" "skin"  "urine" "other"
#> [10] "other" "other" "wound" "other" "other" "other" "other" "other" "other"
#> [19] "blood" "other" "wound" "other" "urine" "urine" "other" "wound" "wound"
#> [28] "urine" "other" "wound" "blood" "other" "other" "other" "blood" "wound"

Created on 2023-06-20 with reprex v2.0.2

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.