Grepl issue with brackets or "~"?

Hi, I have this simple data file where I try to create ModelCat variable based on string values in ModelLong. Strings without brackets and blanks are coded properly whereas two expressions are not. Do you know what I am doing wrong?

library(dplyr)
Sales.data.t <- data.frame(stringsAsFactors = FALSE,
                           ModelLong = c(NA, "bbbb", "aa(2014~)", "aa(2014 ~)", "bb")
)
Sales.data.t

# Creating new variable showing main Models
Sales.mod <- Sales.data.t %>% 
  mutate(ModelCat = case_when(
    grepl(x = ModelLong, pattern = 'aa(2014~)|aa(2014 ~)', ignore.case = TRUE) ~ 'aa (2014 ~)',
    grepl(x = ModelLong, pattern = 'bb|bbbb', ignore.case = TRUE) ~ 'bb',
    TRUE ~ "Other"
  ))

Sales.mod

I cannot find any restrictions in R documentation...

The pattern argument for grepl() has to be a "regular expression", not a literal string, and on "regular expressions" parentheses are metacharacters with special meaning, if you want to match a literal parentheses you have to "scape" the metacharacter with two backslashes e.g. aa\\(2014\\s?~\\)

Once again, your question is not related to R but to regular expressions, so please invest a little time learning it, it's not that hard.

1 Like

R describes how its regular expressions work in the regex help file (just use the command ?regex). It points out that these metacharacters are not always treated literally in a regular expression: . \ | ( ) [ { ^ $ * + ?.

Regex can be difficult, but luckily there are "fun" ways to learn it (modulo your definition of fun). Regex golf is a game where you have to write a single regular expression that matches every string in one list while not matching any string in another.


Extra advice unrelated to your question: If you use the stringi or stringr packages for handling regex (which I would definitely recommend), they use the "ICU flavor" of regex. The ICU's guide describes the special metacharacters. The ICU system handles Unicode very well, which means it's much simpler to work with languages that don't just use ASCII characters.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.