Detecting characters other than alpha numeric or -

budugulo · April 2, 2022, 10:19pm

library(tidyverse)
# Toy data
df <- tibble(
  x = c("123Abcde789",
        "46765%-''098",
        "565--456A",
        "1232133456",
        "'''890976")
)

df
#> # A tibble: 5 x 1
#>   x           
#>   <chr>       
#> 1 123Abcde789 
#> 2 46765%-''098
#> 3 565--456A   
#> 4 1232133456  
#> 5 '''890976

How do I identify rows that contain special characters other than alphanumeric characters or - ? For example, desired code will identify rows 2 and 5 from df because row 2 contains %'' and row 5 contains '''.

FJCC · April 2, 2022, 10:59pm

This labels the rows special characters as TRUE.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
# Toy data
df <- tibble(
  x = c("123Abcde789",
        "46765%-''098",
        "565--456A",
        "1232133456",
        "'''890976")
)

df <- df |> mutate(Flag=str_detect(x,"[^-A-Za-z0-9]"))
df
#> # A tibble: 5 x 2
#>   x            Flag 
#>   <chr>        <lgl>
#> 1 123Abcde789  FALSE
#> 2 46765%-''098 TRUE 
#> 3 565--456A    FALSE
#> 4 1232133456   FALSE
#> 5 '''890976    TRUE

^{Created on 2022-04-02 by the reprex package (v2.0.1)}

budugulo · April 2, 2022, 11:59pm

Thanks a lot @FJCC ! Could you please explain the following part? I would like to learn which part is identifying alphanumeric characters and which part is identifying the -.
"[^-A-Za-z0-9]"

FJCC · April 3, 2022, 1:33am

The first - represents that character in the regular expression. A-Z represents all of the upper case letters, a-z is the lower case letters, and 0-9 is all of the numeric characters.

budugulo · April 3, 2022, 10:28am

Many thanks! It makes sense now

system · April 10, 2022, 10:29am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.