find and replace exact match partial string (gsub)

Hello,

I have a dataframe with a variable (list) where is it possible to find special symbols. Please find below and example or my dataframe:

df_1 <- data.frame(id=1:6,
path.vec = I(list("apple", "lemon", "melon",c("apple", "banana", "melon"),
c("(Tuesday)/lemon", "lemon_Tuesday", "grape/ginger_peppers"),
c("apple", "lemon", "apple"))))
Recombine the path

df_1$recombined = as.character(map(df_1$path.vec, paste, collapse = " > "))

what I want to do is just to find and replace exact matches in my df_1$recombined

I have been trying with gsub

df_1$recombined <- gsub("\<(Tuesday)/lemon\>", "Tuesday", df_1$recombined, fixed =T)

this looks like running but it's not replacing any value.
I have been browsing around and I have found that gsub doens't work well with special characters, so I have tried to with backslashes

df_1$recombined <- gsub("\<\(\Tuesday\)\/lemon\>", "Tuesday", df_1$recombined, fixed =T)

but it doesn't change anyhting as well as square brackets:

df_1$recombined <- gsub("\<[(]Tuesday[)]/lemon\>", "Tuesday", df_1$recombined, fixed =T)

both are running smoothly but not changing any value.
Eventually I would like the below output:

final.recombined
apple
lemon
melon
apple > banana > melon
Tuesday > Tuesday > Grape
apple > lemon > apple

I think this is kind of easy but I can't udnerstand what I'm doing wrong and since the special characters is gsub the best function to use in this situation? I will replace any string with special characters to get from 2 words just one.

thanks!

Hi,

Your problem is both in the matching and the way R handles regex

  1. Your original string in the data frame does NOT contain the < and > around the word, but your pattern does and thus there is no match
  2. In R, you have to use double escape \\ in regex because the first one is for R to not interpret the character when converting the string, and the second is for regex not to interpret the character

This is the line you now need:

gsub("\\(Tuesday\\)\\/lemon", "Tuesday", df_1$recombined)

Note that the fixed = T shoud NOT be there

Hope this helps,
PJ

2 Likes

thank you very much, this works brilliant!
what if I have more conditions, can I use the function gsub to run them all together?
At the moment I'm calling one by one:
df_1$recombined_1 <- gsub("\(Tuesday\)\/lemon", "Tuesday", df_1$recombined)
df_1$recombined_1 <- gsub("grape\/ginger\_peppers", "Tuesday", df_1$recombined_1)

thank you very much for your help!

Just use the pipe character | (is like OR statement) and groups () (this time do NOT escape ( and ) as this is a special grouping character now) to set multiple matching scenarios, providing the replacement will always be the same...

gsub("(\\(Tuesday\\)\\/lemon)|(grape\\/ginger_peppers)", "Tuesday", df_1$recombined)

By the way: A great place to build your regex statement with live feedback is on https://regexr.com
You can paste in the text you want to search, and write the expression in the top bar and see on the go if it'll work (plus you get info if something is wrong). Once finished, just copy the expression into R and add and extra \ at any existing \ to make sure R interprets it correctly.

Wonderful! This is really handing. Cheers

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.