Regex - issue with specific phrases to be found

Hi,
I have an issue with the Regex function.
I'm trying to specify weird comments which should be treated as blanks such as:
"no comment", No Comments", "Nothing", "n a", "n.a", "n.a.", "N/A", "pass" and "na".

I have this code (thank you for helping me with this andresrcs)

blank_statements <- regex("no\\scomment?|nothing|n\\sa|n.a|n.a.|N/A|pass|na", ignore_case = TRUE)

but I have following issues:

  1. N/A is not recognised
  2. I need to find sentences with "no" as a separate word but reprex finds everything with no (for example "monotony" or "nose")
  3. I'm not sure about words with dots (like n.a)

Can you help please?

Maybe you are confussing a missing value ("Not Available") with a character string, in such case you have to test with is.na().

One option is to match empty spaces before and after "no", like \\sno\\s

In regular expressions "." is a metacharacter and means "any character except a newline", if you want to match a literal dot, then you have to scape the metacharacter like this \\.

Also, please consider each new topic as independen from your previous ones, so you have to provide its own sample data and a proper reprex, don't expect people to look for context in your previous posts.

Thank you.
"N/A" is difficult and I still don't know how to 'replace "/" in regex.
The only way I found is not really elegant but I can recode all variables with N/A straight after importing data from excel with this:

source$comment[source$comment=="N/A"|source$comment=="N/a"|source$comment=="n/A"|source$comment=="n/a"] <- "blank"

then I can start processing. "No" is fixed but I still have issues with "n.a.", "na" and "n.a". I am using this:

blank_statements <- regex("\\sno\\s|\\sna\\s|\\sna\\.s|\\sna\\.s\\.", ignore_case = TRUE)

What am I doing wrong?

\\s means an empty space, so this regex you wrote \\sna\\.s\\. would match " na.s."

This would match the "NA" variants you are showing.

library(stringr)
sample <- c("n.a.", "na", "n.a", "n/a")
str_detect(sample, regex("^\\s*n.?a.?\\s*$", ignore_case = TRUE))
#> [1] TRUE TRUE TRUE TRUE

Hurray!!!! Thank you very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.