I am using the below expression to remove all except a-zA-Z0-9&. but I would like to put the character of dash(-) in the expression in order to keep words like "re-entry", "R&D". please help me how to add more.
gsub ( "[^a-zA-Z0-9&]", " ", output, ignore.case = TRUE )
for special character, I think escaping with \\
in R will do the trick. Here is an example where \\-
match -
gsub("\\-", "+", "re-entry")
#> [1] "re+entry"
gsub("[^a-zA-Z0-9&\\-]", "+", "re-entry*")
#> [1] "re-entry+"
3 Likes
> library(stringr)
# extract single-hyphenated strings
> text <- "Lorem 6ipsum-dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore-magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. D-uis aute-irure do&lor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint-occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
pattern <- "[a-zA-Z0-9&]+[&|-]+[a-zA-Z0-9&]+"
str_extract_all(text, pattern)
1] "6ipsum-dolor" "dolore-magna" "con-sequat" "D-uis" "aute-irure" "do&lor"
[7] "sint-occaecat"
Escaping a dash inside [
and ]
seem like best practice and has my vote. As an alternative, one is allowed to place a "-"
as the first or last character inside [
and ]
, e.g. "[^a-zA-Z0-9&-]"
, in which case escaping is not required.
sub("[abc-]", "!!", "hel-lo")
# [1] "hel!!lo"
1 Like
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.