Formulating regex patterns.

Hi,

I would appreciate help with constructing regex pattern. I have a character vector such as

vector <- c("r01052",
            "r0105a2",
            "r01052a",
            "r0105a2a")

I would like to create two columns. First, with first 5 characters and if there is a letter after the fifth character => with first 6 characters. Hence, in this case, I would get:

first_column <- c("r0105",
                  "r0105a",
                  "r0105",
                  "r0105a")

Second, with last character and if there is a letter as a last character => with two last characters. Again, in this case, I would get:

second_column <- c("2",
                  "2",
                  "2a",
                  "2a")

Could you help me with constructing regex patterns for str_substract? Note that the provided vector is just an example and individual letters and individual numbers change, the only thing which is constant is the pattern of letters and numbers.

Many thinks,

Jakub

Hi @Jakub_Komarek

I used the same regex for both columns, one with str_extract, the other one with str_remove.

vector <- c("r01052",
            "r0105a2",
            "r01052a",
            "r0105a2a")

first_column <- str_extract(vector, "^\\w{5}[a-zA-Z]?")
second_column <- str_remove(vector, "^\\w{5}[a-zA-Z]?")

Hope it helps.

1 Like

Another option

library(stringr)

vector <- c("r01052",
            "r0105a2",
            "r01052a",
            "r0105a2a")

str_match(vector, "(?<first>^[a-z]\\d{4}[a-z]?)(?<second>\\d[a-z]?$)")[,2:3]
#>      first    second
#> [1,] "r0105"  "2"   
#> [2,] "r0105a" "2"   
#> [3,] "r0105"  "2a"  
#> [4,] "r0105a" "2a"

Created on 2022-03-19 by the reprex package (v2.0.1)

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.