Select specific string by regular expresion

Hi community

Im want to select only the string that contain letter PI. Im try with regular expression but I not expert in this topics.

And when get all PI, is necessary put a space between PI and numbers, like this: PI 313836 PI 535336

DATA2<- data.frame(PI_NUM=c("TARS-324, TARS-324A, PI535336", "PI207372, HDR-0975", "PI313490, HDR-0975, HDR-0475", 
                     "PI313836", "A-947, TARS-142, TARS-156, L-331, L-564, L-566, PI535210", 
                     "PI494141, W6 21109", "TARS-319, PI494131, NI-1062", "NI-1036, PI494131,NI-1062", 
                     "PI325754", "PI313784")) 

DATA3 <- as.data.frame(grep(pattern = '\\PI.?([0-9])', 
                             x = DATA2, 
                             value = T));DATA3

Does this produce the desired outcome?

library(tidyverse)
DATA2<- data.frame(PI_NUM=c("TARS-324, TARS-324A, PI535336", "PI207372, HDR-0975", "PI313490, HDR-0975, HDR-0475", 
                            "PI313836", "A-947, TARS-142, TARS-156, L-331, L-564, L-566, PI535210", 
                            "PI494141, W6 21109", "TARS-319, PI494131, NI-1062", "NI-1036, PI494131,NI-1062", 
                            "PI325754", "PI313784")) 

DATA3 = DATA2 |>
  mutate(PI_NUM = str_extract(PI_NUM, 'PI[0-9]{6}')) |>
  mutate(PI_NUM = str_replace(PI_NUM, 'PI', 'PI '))

DATA3
#>       PI_NUM
#> 1  PI 535336
#> 2  PI 207372
#> 3  PI 313490
#> 4  PI 313836
#> 5  PI 535210
#> 6  PI 494141
#> 7  PI 494131
#> 8  PI 494131
#> 9  PI 325754
#> 10 PI 313784

Created on 2022-12-14 with reprex v2.0.2

2 Likes

Yeah, is the desired outcome.

:muscle:t4:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.