One way to do this is to use the `dplyr`

window functions, specifically `lag()`

, which compares the value in a row to the row above it.

I did this in two steps, so you could see what happens when you create a new variable using `lag()`

within `mutate()`

. Importantly, I am also grouping by pitcher so that the series of pitches is considered for each pitcher. The second step is to calculate the percentage of pitches that were different from the one before (still grouped by pitcher). I used `mean(diff_pitch)`

, knowing that `TRUE`

evaluates to `1`

and `FALSE`

evaluates to `0`

, so taking the average will give us the percentage that were different.

```
library(tidyverse)
df <- tribble(
~"pitcher", ~"pitch",
"A", "Fastball",
"A", "Fastball",
"A", "Fastball",
"A", "Fastball",
"A", "Slider",
"A", "Fastball",
"B", "Fastball",
"B", "Slider",
"B", "Fastball",
"B", "Slider",
"B", "Fastball",
"B", "Slider"
)
pitch_change <- df %>%
group_by(pitcher) %>%
mutate(diff_pitch = pitch != lag(pitch)) %>%
print()
#> # A tibble: 12 x 3
#> # Groups: pitcher [2]
#> pitcher pitch diff_pitch
#> <chr> <chr> <lgl>
#> 1 A Fastball NA
#> 2 A Fastball FALSE
#> 3 A Fastball FALSE
#> 4 A Fastball FALSE
#> 5 A Slider TRUE
#> 6 A Fastball TRUE
#> 7 B Fastball NA
#> 8 B Slider TRUE
#> 9 B Fastball TRUE
#> 10 B Slider TRUE
#> 11 B Fastball TRUE
#> 12 B Slider TRUE
pitch_change %>%
summarize(diff_pitch_pct = mean(diff_pitch, na.rm = TRUE))
#> # A tibble: 2 x 2
#> pitcher diff_pitch_pct
#> <chr> <dbl>
#> 1 A 0.4
#> 2 B 1
```

^{Created on 2019-06-18 by the reprex package (v0.3.0)}