Enigmatic regex behaviour

Hi everyone,

I'm having trouble with a regex expression which should be fairly easy, I just can't make out the problem.

I have a vector that looks something like this:

vec <-  c("TP1.CTRL.2", "X", "TP1.CTRL.4", "X.1", "TP1.CTRL.5" "X.2")

and I want to swap all the "^(X.*)" entries (so all with an X or an X, a dot, and a number) for a new string.

I checked with a regex tester (https://regexr.com/) if the patterns get recognized, everything looks good.

But when I try the following code, it doesn't work:

for(i in 1:length(vec)){
  vec[i] <- ifelse(vec[i] == "^(X.*)", 'CPM', vec[i])
}
print(vec)

Nothing gets exchanged for "CPM", no error pops up; this means that r doesnt think the regex expression matches any of the vector entries, even though it should.... what am I missing?

Thanks for your help!

You are not using regex, that's the mistake :slight_smile:

You wrote a regex pattern, but you cannot check whether a string conform to this pattern or not using ==. That just checks for exact equality, and obviouly all leads to FALSE in your example.

You can do something like this, but I'll suggest to change the regex. It does not really match "X, a dot, and a number", it matches a string starting with a X, and any number of characters thereafter.

> vec <-  c("TP1.CTRL.2", "X", "TP1.CTRL.4", "X.1", "TP1.CTRL.5", "X.2")
> 
> ifelse(test = grepl(pattern = "^(X.*)",
+                     x = vec),
+        yes = "CPM",
+        no = vec)
[1] "TP1.CTRL.2" "CPM"        "TP1.CTRL.4"
[4] "CPM"        "TP1.CTRL.5" "CPM"       
> 

Hope this helps.

It did help, thanks a lot - I'll remember to use grepl rather than == when working with regex!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.