some doubt about regular expressions

I don't understand why obtain null result using first regex

library(stringr)
x <- "Roman numerals: MDCCCLXXXVIII"
str_match(x, "C?")
#>      [,1]
#> [1,] ""
str_match(x, "CC?")
#>      [,1]
#> [1,] "CC"

Created on 2019-11-06 by the reprex package (v0.3.0)

Because that regex means "match possibly a C but nothing is OK too" so it is matching nothing (This is different from NA or Null). Maybe this makes it more evident.

library(stringr)
x <- "Roman numerals: MDCCCLXXXVIII"
str_match_all(x, "C?")
#> [[1]]
#>       [,1]
#>  [1,] ""  
#>  [2,] ""  
#>  [3,] ""  
#>  [4,] ""  
#>  [5,] ""  
#>  [6,] ""  
#>  [7,] ""  
#>  [8,] ""  
#>  [9,] ""  
#> [10,] ""  
#> [11,] ""  
#> [12,] ""  
#> [13,] ""  
#> [14,] ""  
#> [15,] ""  
#> [16,] ""  
#> [17,] ""  
#> [18,] ""  
#> [19,] "C" 
#> [20,] "C" 
#> [21,] "C" 
#> [22,] ""  
#> [23,] ""  
#> [24,] ""  
#> [25,] ""  
#> [26,] ""  
#> [27,] ""  
#> [28,] ""  
#> [29,] ""  
#> [30,] ""
1 Like

I see. Thank you very much

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.