PCRE for stringr regex like grep or sub?

Hi all,

In the code below, str_replace fails but sub succeeds. My goal is to find Fortran code (in this case same as R would do it) and replace with code that works in Rust. So I want to replace exp(x) with x.exp(). However, x may be complex with embedded (), so the regular expression is equally complex and recursive to find the content only between the outer () associated with exp.

Yes, I could stick with sub, but I'm trying to figure out why tidyverse is failing here. I'm not sure what engine it uses for regex, but it doesn't seem to be PCRE, which is triggered by perl = TRUE in the sub function. Is there a way to trigger PCRE for stringr functions?

string <- "CL = CLs * exp((pkvisit-1) * theta1) * (WT/70)**0.75"
pattern <- "exp(\\((?:[^)(]+|(?1))*+\\))"
replace <- "\\1\\.exp\\(\\)"

stringr::str_replace(string, pattern, replace) #fails
#> Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), : Syntax error in regex pattern. (U_REGEX_RULE_SYNTAX, context=`exp(\((?:[^)(]+|(?1))*+\))`)
sub(pattern, replace, string, perl = TRUE) #works
#> [1] "CL = CLs * ((pkvisit-1) * theta1).exp() * (WT/70)**0.75"

Created on 2024-03-27 with reprex v2.1.0

Best I can tell, stringr calls stringi, which uses the ICU regex engine. It seems recursive expressions, and in particular subroutines, are not implemented, so the problem is with (?1).

From this, it seems you can't switch the engine on stringi, and that's by design:

Base R gives access to two different regex matching engines [...]
• PCRE6 (Perl-compatible regular expressions); activated when perl=TRUE is set.
[...]
Stringi, on the other hand, provides access to the regex engine implemented in ICU, which
was inspired by Java’s util.regex in JDK 1.4. Their syntax is mostly compatible with that
of PCRE, although certain more advanced facets might not be supported (e.g., recursive
patterns).

Thanks for finding that documentation. I had looked at stringi as well, but did not find a way to change the regex engine, which you have confirmed.

That's too bad, as the whole sub family is not tidyverse compatible because x is not the first argument in any of them. It would be great if stringr::regex had an option to use PCRE. Of course, now that I look at the help for regex again, I see that it actually says "uses ICU expressions." Duh on me.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.