Hi! I am pretty new to R and regular expressions, and looking for suggestions on how to properly use the strsplit() function with a regular expression that suits my needs. I am writing a program to count consonant clusters (two or more consonants in a row) in a word, but having trouble with a regex to properly describe this.
Here is the code I am working with:
klattese <- "strieeteff"
split <- strsplit(klattese, "[iIEe@aWY^cOoUuRx|X\\-\\']+")
This works for a string like klattese
above, producing expected output "str" "t" "ff"
but is failing on more complicated strings.
klattese <- "^-b@ˈn-d^nd"
split <- strsplit(klattese, "[iIEe@aWY^cOoUuRx|X\\-\\']+")
Produces "" "-b" "'n-d" "nd"
, but my expected output is "b" "n" "d" "nd"
.
Are there any suggestions on a different regex I could use to get the expected results? I think it may have something to do with the special characters '
and -
but I am not certain. I have also tried regex with "[iIEe@aWY^cOoUuRx|X\-\']+"
using just one backslash escape, but still no luck.