Understanding the use of str_replace_all with multiple patterns

stringr

#1

Hi all, I’m trying to use stringr::str_replace_all to substitute a number of fixed patterns in a text vector with one pattern. So something like:

x = c('banana_cake', 'orange_pear', 'lemon_meringue_tart')
pats = c('banana', 'pear', 'lemon')
str_replace_all(x, pats, 'NOOO')
# [1] "NOOO_cake"          "orange_NOOO"        "NOOO_meringue_tart"

However, if I change the size of either the pattern vector or the string vector to substitute on, I get either errors about vector size or no replacement (or both):

patterns = c('banana', 'apple')
desserts = c('apple_pie', 'banana_cake', 'pumpkin_pie', 'grape_soda', 'something_else')
str_replace_all(desserts, patterns, 'NOPE')
# [1] "apple_pie"      "banana_cake"    "pumpkin_pie"    "grape_soda"
# [5] "something_else"
# Warning message:
# In stri_replace_all_regex(string, pattern, fix_replacement(replacement),  :
#   longer object length is not a multiple of shorter object length

Am I just trying to use str_replace_all for something it’s not really designed to do?

(Note: I chose to put this Q here, rather than SO, because I suspect I am in fact trying to use the wrong tool for the job and that this will probably turn into a discussion on the tool I should be using :wink: )


#2

When fed with a single pattern, str_replace_all will compare that pattern for against every element. However, if you pass it a vector, it will try to respect the order, so compare the first pattern with the first object, then the second pattern with the second object.

As the error suggests, the problem is when these two vectors differ in length. In base R, the vector is recycled. However the tidyverse packages are more strict, so will avoid recycling vectors to avoid unintentional effects.

To solve your problem, I suggest you try using the regex operator |.

pats <- c("apple|banana|pumpkin")

This should match apple, banana OR pumpkin.

Note - I think this works, but I haven’t had time to try it, sorry!


#3

Ahh, thanks @ConnorKirk! I’ve understood the vector recycling error in other contexts, but I didn’t realise str_replace_all would compare the pattern’s and the target object’s elements in an element-wise fashion. That makes a lot more sense :smile:

The | operator appears to work absolutely fine:

> library(stringr)
> desserts = c('apple_pie', 'banana_cake', 'pumpkin_pie', 'grape_soda', 'something_else')
> pats <- c("apple|banana|pumpkin")
[1] "apple|banana|pumpkin"
> str_replace_all(desserts, pats, 'NOOO')
[1] "NOOO_pie"       "NOOO_cake"      "NOOO_pie"       "grape_soda"    
[5] "something_else"
> str_replace_all(desserts, 'pie', 'NOOO')
[1] "apple_NOOO"     "banana_cake"    "pumpkin_NOOO"   "grape_soda"    
[5] "something_else"
> str_replace_all(desserts, 'pie|pumpkin', 'NOOO')
[1] "apple_NOOO"     "banana_cake"    "NOOO_NOOO"      "grape_soda"    
[5] "something_else" 

… I should go have dinner. :thinking: