Is there a way to calculate this? -Unique Data

cadebunton · June 18, 2019, 9:27pm

Hi,

I was wondering if there was a way to calculate this on R.

Let's say pitcher A throws six pitches.
Fastball, Fastball, Fastball, Fastball, Slider, Fastball

Pitcher B also throws six pitches
Fastball, Slider, Fastball, Slider, Fastball, Slider

Is there a way to calculate how often the pitcher throws a different pitch than the pitch right before it? So obviously, we can look and see that 100% of the time Pitcher B throws a different pitch than the previous one.

Thank you!

mfherman · June 19, 2019, 12:58am

One way to do this is to use the dplyr window functions, specifically lag(), which compares the value in a row to the row above it.

I did this in two steps, so you could see what happens when you create a new variable using lag() within mutate(). Importantly, I am also grouping by pitcher so that the series of pitches is considered for each pitcher. The second step is to calculate the percentage of pitches that were different from the one before (still grouped by pitcher). I used mean(diff_pitch), knowing that TRUE evaluates to 1 and FALSE evaluates to 0, so taking the average will give us the percentage that were different.

library(tidyverse)

df <- tribble(
  ~"pitcher", ~"pitch",
  "A", "Fastball",
  "A", "Fastball",
  "A", "Fastball",
  "A", "Fastball",
  "A", "Slider", 
  "A", "Fastball",
  "B", "Fastball",
  "B", "Slider",
  "B", "Fastball",
  "B", "Slider",
  "B", "Fastball", 
  "B", "Slider"
)

pitch_change <- df %>% 
  group_by(pitcher) %>% 
  mutate(diff_pitch = pitch != lag(pitch)) %>% 
  print()
#> # A tibble: 12 x 3
#> # Groups:   pitcher [2]
#>    pitcher pitch    diff_pitch
#>    <chr>   <chr>    <lgl>     
#>  1 A       Fastball NA        
#>  2 A       Fastball FALSE     
#>  3 A       Fastball FALSE     
#>  4 A       Fastball FALSE     
#>  5 A       Slider   TRUE      
#>  6 A       Fastball TRUE      
#>  7 B       Fastball NA        
#>  8 B       Slider   TRUE      
#>  9 B       Fastball TRUE      
#> 10 B       Slider   TRUE      
#> 11 B       Fastball TRUE      
#> 12 B       Slider   TRUE

pitch_change %>% 
  summarize(diff_pitch_pct = mean(diff_pitch, na.rm = TRUE))
#> # A tibble: 2 x 2
#>   pitcher diff_pitch_pct
#>   <chr>            <dbl>
#> 1 A                  0.4
#> 2 B                  1

^{Created on 2019-06-18 by the reprex package (v0.3.0)}

cadebunton · June 19, 2019, 1:07am

Thank you very much!

system · June 26, 2019, 1:07am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.