library(tidyverse)
# toy data
df <- tibble(
text = c("abcdefgh", "abcd-efg", "123d*-e", "567xyz", "'!abc")
)
df
#> # A tibble: 5 × 1
#> text
#> <chr>
#> 1 abcdefgh
#> 2 abcd-efg
#> 3 123d*-e
#> 4 567xyz
#> 5 '!abc
How can I mutate a new column, say, issue, which will identify if the columns text contains non-alphanumeric characters excluding the -.? In other words, the issue column will be NA if it only contains alphanumeric characters or -.
Sure The rowwise() is necessary because the str_extract_all() inside the case_when() function is vectorized. If you wouldn't do that, the result would be a combination of all (rowwise) results, so in this case *'! in both rows where the condition is true. rowwise() makes sure the str_extract_all() only extracts strings from the specific row we are at the moment.
The str_detect() checks, if we have non alphanumeric Strings inside the text column. You can think of the chain in the following way:
Take Data and add a column issue by checking rowwise, if there are nonalphanumeric characters (excluding the minus sign) inside text. If so, extract all those nonalphanumeric characters and paste them together in one chr scalar (hence the paste(collapse = ''). Otherwise insert NA_character_.
I hope this helps you understand the code above a bit better