I have a dataframe of tweets, many with emojis in the text field, that I want to tokenize using tidytext. Many of the emojis lack a space between them and other emojis/text, making it hard to tokenize.
example <- "My priorities Saftey First\U0001f1fa\U0001f1f8\U0001f64f What were yours?"
I would like to be able to use str_replace_all (or another option) to add a space before the emoji as below:
"My priorities Saftey First \U0001f1fa \U0001f1f8 \U0001f64f What were yours?"
I have tried using the following but get an error:
str_replace_all(example, "\\U", " \\U") Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), : Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE)
Working off of this example I also tried the below but did not seem to alter the text.
str_replace_all(test, "\\\\U", " \\\\U")