Replacing text using vectors

Hi everyone, I have created the following text definitions that are trying to fix common misspelled words using a vector. I want to use the definition term to replace the terms on the right hand side that appear on multiple occasions on a column in a table I am trying to clean. I can't figure it out... grateful for any advice you can provide. Thank you!

cannabis <- c("cannabis")
spice/PS <- c("spice", "nps", "synthetic cannabis", "synthetic cannabinoid", "synthetic cannaboid", "mamba", "psychoactive")
tranquilisers <- c("zopli", "zopi", "zolpi", "ziplocane", "xanax", "xanex", "zispin", "mirtazapine", "mertazipine", "benzodiazepines", "valium", "diazapam", "diazepam", "tradazone", "trazodone", "ketamine")

Hi,
The problem with that kind of substitutions is that accessing the names you defined gets a little tricky, I recommend that you create a list(), and then substitute the values for your column vector (or create a new one) within a for loop of the names() values.

So I would do something like the following, by creating a categories object with the vectors you posted, given that your table name is data and your column is compound, cleanCompound will be the result to append or substitute in the table.

categories <- list(
    cannabis = c("cannabis"),
    spice = c(...),
    ## the rest of the categories...
)

# Ensure character vector for the result
cleanCompound <- as.character(data$compound)

## Substitute each group
for (categ in names(categories)) {
    n = which(cleanCompound %in% categories[[categ]]))
    # Inside an if just in case...
    if (length(n) > 0L) {
        cleanCompound[n] <- rep(categ,length(n))
    }
}

## If you want to substitute the table
data$compound <- cleanCompound

Thanks very much for the prompt answer, I will try that.

There is also a technique using named vector; consider this example:

input <- "chat"

translation <- c("cat" = "felis", "chat" = "felis", "gato" = "felis", "dog" = "canis")

if(!is.na(translation[input])) input <- translation[input]

cat(input)

It is fully vectorized and requires a bit less code, but needs a named vector to work.

Other possibilities could be constructed using dplyr::*_join() family.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.