regular expression with the character of dash

I am using the below expression to remove all except a-zA-Z0-9&. but I would like to put the character of dash(-) in the expression in order to keep words like "re-entry", "R&D". please help me how to add more.
gsub ( "[^a-zA-Z0-9&]", " ", output, = TRUE )

for special character, I think escaping with \\ in R will do the trick. Here is an example where \\- match -

gsub("\\-", "+", "re-entry")
#> [1] "re+entry"
gsub("[^a-zA-Z0-9&\\-]", "+", "re-entry*")
#> [1] "re-entry+"
> library(stringr)

# extract single-hyphenated strings
> text <- "Lorem 6ipsum-dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore-magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. D-uis aute-irure do&lor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint-occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
pattern <- "[a-zA-Z0-9&]+[&|-]+[a-zA-Z0-9&]+"
str_extract_all(text, pattern)
1] "6ipsum-dolor"  "dolore-magna"  "con-sequat"    "D-uis"         "aute-irure"    "do&lor"       
[7] "sint-occaecat"

Escaping a dash inside [ and ] seem like best practice and has my vote. As an alternative, one is allowed to place a "-" as the first or last character inside [ and ], e.g. "[^a-zA-Z0-9&-]", in which case escaping is not required.

sub("[abc-]", "!!", "hel-lo")
# [1] "hel!!lo"
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.