I was wondering if there was a way to calculate this on R.
Let's say pitcher A throws six pitches.
Fastball, Fastball, Fastball, Fastball, Slider, Fastball
Pitcher B also throws six pitches
Fastball, Slider, Fastball, Slider, Fastball, Slider
Is there a way to calculate how often the pitcher throws a different pitch than the pitch right before it? So obviously, we can look and see that 100% of the time Pitcher B throws a different pitch than the previous one.
One way to do this is to use the dplyr window functions, specifically lag(), which compares the value in a row to the row above it.
I did this in two steps, so you could see what happens when you create a new variable using lag() within mutate(). Importantly, I am also grouping by pitcher so that the series of pitches is considered for each pitcher. The second step is to calculate the percentage of pitches that were different from the one before (still grouped by pitcher). I used mean(diff_pitch), knowing that TRUE evaluates to 1 and FALSE evaluates to 0, so taking the average will give us the percentage that were different.
library(tidyverse)
df <- tribble(
~"pitcher", ~"pitch",
"A", "Fastball",
"A", "Fastball",
"A", "Fastball",
"A", "Fastball",
"A", "Slider",
"A", "Fastball",
"B", "Fastball",
"B", "Slider",
"B", "Fastball",
"B", "Slider",
"B", "Fastball",
"B", "Slider"
)
pitch_change <- df %>%
group_by(pitcher) %>%
mutate(diff_pitch = pitch != lag(pitch)) %>%
print()
#> # A tibble: 12 x 3
#> # Groups: pitcher [2]
#> pitcher pitch diff_pitch
#> <chr> <chr> <lgl>
#> 1 A Fastball NA
#> 2 A Fastball FALSE
#> 3 A Fastball FALSE
#> 4 A Fastball FALSE
#> 5 A Slider TRUE
#> 6 A Fastball TRUE
#> 7 B Fastball NA
#> 8 B Slider TRUE
#> 9 B Fastball TRUE
#> 10 B Slider TRUE
#> 11 B Fastball TRUE
#> 12 B Slider TRUE
pitch_change %>%
summarize(diff_pitch_pct = mean(diff_pitch, na.rm = TRUE))
#> # A tibble: 2 x 2
#> pitcher diff_pitch_pct
#> <chr> <dbl>
#> 1 A 0.4
#> 2 B 1