Output for R code incorrect though code seems correct

Description
The R code below is only outputting correctly in rows 1, 2, and 10. (See image.) The input is in column 1 of the image. As the code seems correct, can anyone tell me why the output has errors?

Code

str.regex.pattern <- "(\\d+)\\s\\((\\d+)\\-(\\d+)\\)"

pc3 %>% mutate(
    pitches = gsub(str.regex.pattern, "\\1", Pitcnt), 
    x = gsub(str.regex.pattern, "\\2", Pitcnt), 
    y = gsub(str.regex.pattern, "\\3", Pitcnt)
)

Output

I tried manually testing a couple of your Pitcnt values that don't work and the gsub() works as expected. Is it possible that there are invisible characters in the Pitcnt column? Try reentering some of the data by hand.

str.regex.pattern <- "(\\d+)\\s\\((\\d+)\\-(\\d+)\\)"
gsub(str.regex.pattern, "\\1", "3 (0-2)")
[1] "3"
gsub(str.regex.pattern, "\\2", "3 (0-2)")
[1] "0"
gsub(str.regex.pattern, "\\3", "3 (0-2)")
[1] "2"
gsub(str.regex.pattern, "\\1", "8 (3-2)")
[1] "8"
gsub(str.regex.pattern, "\\2", "8 (3-2)")
[1] "3"
gsub(str.regex.pattern, "\\3", "8 (3-2)")
[1] "2"

I can't find any invisible characters. Below is how it still appears on my screen, everything within the red box incorrect.

Please post your data here. If the tibble is named DF, post the output of

dput(DF)

Here is the data:

structure(list(Pitcnt = c("12 (3-2)", "7 (0-1)", "3 (0-2)", 
"8 (3-2)", "5 (2-2)", "7 (3-2)", "1 (0-0)", "2 (0-1)", "5 (1-2)", 
"10 (3-2)")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-10L))

With your example data, the code works for me

library(dplyr)

pc3 <- structure(list(Pitcnt = c("12 (3-2)", "7 (0-1)", "3 (0-2)", 
                          "8 (3-2)", "5 (2-2)", "7 (3-2)", "1 (0-0)", "2 (0-1)", "5 (1-2)", 
                          "10 (3-2)")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                -10L))
str.regex.pattern <- "(\\d+)\\s\\((\\d+)\\-(\\d+)\\)"

pc3 %>% mutate(
  pitches = gsub(str.regex.pattern, "\\1", Pitcnt), 
  x = gsub(str.regex.pattern, "\\2", Pitcnt), 
  y = gsub(str.regex.pattern, "\\3", Pitcnt)
)
#> # A tibble: 10 × 4
#>    Pitcnt   pitches x     y    
#>    <chr>    <chr>   <chr> <chr>
#>  1 12 (3-2) 12      3     2    
#>  2 7 (0-1)  7       0     1    
#>  3 3 (0-2)  3       0     2    
#>  4 8 (3-2)  8       3     2    
#>  5 5 (2-2)  5       2     2    
#>  6 7 (3-2)  7       3     2    
#>  7 1 (0-0)  1       0     0    
#>  8 2 (0-1)  2       0     1    
#>  9 5 (1-2)  5       1     2    
#> 10 10 (3-2) 10      3     2

Created on 2023-05-26 with reprex v2.0.2

If you copy and paste my code above, does it work for you?
Sometimes restarting R and RStudio fixes inexplicable results.

1 Like

Pasting your code code worked for the pc3 dataset; however, that dataset is only a small sample of the full dataset. When I tried the code again with the full dataset, the same problem returned. Below is a larger sample from the full dataset. If someone can test the code with this data, I would much appreciate it as the new sample data (with the pasted code) also produces the same issue on my device.

In my effort to resolve this issue I both restarted RStudio and updated my version of R to 4.3.

structure(list(Gm = c(36, 18, 26, 30, 14, 3, 32, 6, 20), Date = c("2023-04-27", 
"2023-04-10", "2023-04-03", "2023-04-07", "2023-05-16", "2023-04-18", 
"2023-04-04", "2023-04-22", "2023-04-28"), Pitcher = c("Brooks Raley", 
"Max Scherzer", "Tommy Hunter", "Dennis Santana", "Justin Verlander", 
"Tylor Megill", "Brooks Raley", "David Peterson", "David Peterson"
), Opp = c("WSN", "SDP", "@MIL", "MIA", "TBR", "@LAD", "@MIL", 
"@SFG", "ATL"), Batter = c("CJ Abrams", "Austin Nola", "Brice Turang", 
"Garrett Cooper", "Isaac Paredes", "J.D. Martinez", "Brian Anderson", 
"Brandon Crawford", "Matt Olson"), Score = c("ahead 7-4", "ahead 2-0", 
"down 0-6", "ahead 6-0", "tied 0-0", "tied 0-0", "down 0-5", 
"down 0-1", "down 0-1"), Event = c("HR", "SO", "HR", "HR", "HR", 
"HR", "HR", "HR", "HR"), Inn = c("t8", "t5", "b5", "t8", "t3", 
"b1", "b7", "b1", "t5"), RoB = c("123", "1--", "123", "12-", 
"12-", "1--", "12-", "12-", "1-3"), Out = c(1, 2, 1, 2, 2, 1, 
2, 2, 2), Pitcnt = c("2 (0-1)", "11 (3-2)", "2 (0-1)", "8 (3-2)", 
"6 (3-2)", "10 (3-2)", "2 (0-1)", "6 (3-2)", "1 (0-0)"), 
    R = c(4, 0, 4, 3, 3, 2, 3, 3, 3), WPA = c(-0.54, 0.03, -0.01, 
    -0.02, -0.29, -0.16, -0.01, -0.21, -0.25), RE24 = c(-2.71, 
    0.22, -2.7, -2.68, -2.68, -1.73, -2.67, -2.67, -2.62), LI = c(3.9, 
    0.98, 0.07, 0.16, 1.89, 1.15, 0.08, 1.42, 1.91), PlayDesc = c("Home Run (Fly Ball to Deep CF-RF); Thomas Scores/unER; Garrett Scores; Robles Scores", 
    "Strikeout Swinging", "Home Run (Fly Ball to Deep CF-RF); Tellez Scores; Mitchell Scores; Anderson Scores", 
    "Home Run (Fly Ball to Deep CF-RF); Arraez Scores; Soler Scores", 
    "Home Run (Fly Ball to Deep LF); Ramírez Scores; Franco Scores", 
    "Home Run (Fly Ball to Deep CF-RF); Freeman Scores", "Home Run (Fly Ball to Deep CF); Yelich Scores; Adames Scores", 
    "Home Run (Fly Ball to Deep RF Line); Conforto Scores; Flores Scores", 
    "Home Run (Fly Ball to Deep CF-RF); Harris Scores; Acuña Scores"
    )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-9L))

Here is what I get.

library(dplyr)

pc3 <- structure(list(Gm = c(36, 18, 26, 30, 14, 3, 32, 6, 20), 
                      Date = c("2023-04-27", "2023-04-10", "2023-04-03", "2023-04-07", "2023-05-16", "2023-04-18", 
                               "2023-04-04", "2023-04-22", "2023-04-28"), 
                      Pitcher = c("Brooks Raley", 
                                  "Max Scherzer", "Tommy Hunter", "Dennis Santana", "Justin Verlander", 
                                  "Tylor Megill", "Brooks Raley", "David Peterson", "David Peterson"
                      ), 
                      Opp = c("WSN", "SDP", "@MIL", "MIA", "TBR", "@LAD", "@MIL", 
                              "@SFG", "ATL"), 
                      Batter = c("CJ Abrams", "Austin Nola", "Brice Turang", 
                                 "Garrett Cooper", "Isaac Paredes", "J.D. Martinez", "Brian Anderson", 
                                 "Brandon Crawford", "Matt Olson"), 
                      Score = c("ahead 7-4", "ahead 2-0", 
                                "down 0-6", "ahead 6-0", "tied 0-0", "tied 0-0", "down 0-5", 
                                "down 0-1", "down 0-1"), 
                      Event = c("HR", "SO", "HR", "HR", "HR", 
                                "HR", "HR", "HR", "HR"), 
                      Inn = c("t8", "t5", "b5", "t8", "t3", 
                              "b1", "b7", "b1", "t5"), 
                      RoB = c("123", "1--", "123", "12-", 
                              "12-", "1--", "12-", "12-", "1-3"), 
                      Out = c(1, 2, 1, 2, 2, 1, 
                              2, 2, 2), 
                      Pitcnt = c("2 (0-1)", "11 (3-2)", "2 (0-1)", "8 (3-2)", 
                                 "6 (3-2)", "10 (3-2)", "2 (0-1)", "6 (3-2)", "1 (0-0)"), 
                      R = c(4, 0, 4, 3, 3, 2, 3, 3, 3), 
                      WPA = c(-0.54, 0.03, -0.01, 
                              -0.02, -0.29, -0.16, -0.01, -0.21, -0.25), 
                      RE24 = c(-2.71, 
                               0.22, -2.7, -2.68, -2.68, -1.73, -2.67, -2.67, -2.62), 
                      LI = c(3.9, 
                             0.98, 0.07, 0.16, 1.89, 1.15, 0.08, 1.42, 1.91), 
                      PlayDesc = c("Home Run (Fly Ball to Deep CF-RF); Thomas Scores/unER; Garrett Scores; Robles Scores", 
                                   "Strikeout Swinging", "Home Run (Fly Ball to Deep CF-RF); Tellez Scores; Mitchell Scores; Anderson Scores", 
                                   "Home Run (Fly Ball to Deep CF-RF); Arraez Scores; Soler Scores", 
                                   "Home Run (Fly Ball to Deep LF); Ramírez Scores; Franco Scores", 
                                   "Home Run (Fly Ball to Deep CF-RF); Freeman Scores", "Home Run (Fly Ball to Deep CF); Yelich Scores; Adames Scores", 
                                   "Home Run (Fly Ball to Deep RF Line); Conforto Scores; Flores Scores", 
                                   "Home Run (Fly Ball to Deep CF-RF); Harris Scores; Acuña Scores"
                      )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                  -9L))

str.regex.pattern <- "(\\d+)\\s\\((\\d+)\\-(\\d+)\\)"

PITCHES <- pc3 %>% mutate(
  pitches = gsub(str.regex.pattern, "\\1", Pitcnt), 
  x = gsub(str.regex.pattern, "\\2", Pitcnt), 
  y = gsub(str.regex.pattern, "\\3", Pitcnt)
)

PITCHES[, c("Pitcnt", "pitches", "x","y")]
#> # A tibble: 9 × 4
#>   Pitcnt   pitches x     y    
#>   <chr>    <chr>   <chr> <chr>
#> 1 2 (0-1)  2       0     1    
#> 2 11 (3-2) 11      3     2    
#> 3 2 (0-1)  2       0     1    
#> 4 8 (3-2)  8       3     2    
#> 5 6 (3-2)  6       3     2    
#> 6 10 (3-2) 10      3     2    
#> 7 2 (0-1)  2       0     1    
#> 8 6 (3-2)  6       3     2    
#> 9 1 (0-0)  1       0     0

Created on 2023-05-27 with reprex v2.0.2

1 Like

The mystery continues as using the same code (except I used df as the dataset name for the same data), this is what I get:

str.regex.pattern <- "(\\d+)\\s\\((\\d+)\\-(\\d+)\\)"

PITCHES <- df %>% mutate(
  pitches = gsub(str.regex.pattern, "\\1", Pitcnt), 
  x = gsub(str.regex.pattern, "\\2", Pitcnt), 
  y = gsub(str.regex.pattern, "\\3", Pitcnt)
)

PITCHES[, c("Pitcnt", "pitches", "x","y")]

To make the comparison as similar as possible, try
Starting a fresh R session so there is nothing in the global environment.
Name the data frame DF so it matches mine and does not conflict with the function df().
Load the dplyr package and nothing else.
Run the code.

I'm really hoping that works though it is 99% magical thinking.

I followed your suggestion, but it didn't make any difference. I'm going to explore alternative ways of coding it that don't use "gsub." I really appreciate all your help.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.