I'm working with baseball data. I've created a few separate df
that does specific calculations that all are to get used on a master df
called value_tbl
that has the bulk of my mutate
work. The first approach I used was a join of the calc_tbl
to the players
in long format. That didn't work due to the calc_BA
requiring elements from the lg_avg
df
(Unless I just approached it wrong)
I'm looking for a better and more efficient way to go about writing my last df
. Specifically:
- Is there a better way to "call" the results from the
calc_tbl
andlg_avg
? - Is there a more efficient way to write that last mutate?
Open to suggestions on the overall workflow as well.
library(tidyverse)
# Table used for league average calculations below
teams <- tibble::tribble(
~Team, ~PA, ~AB, ~H, ~HBP, ~BB, ~HR, ~R, ~RBI, ~SB, ~AVG, ~OBP,
"Athletics", 6255, 5579, 1407, 76, 550, 227, 813, 778, 35, 0.252, 0.325,
"Red Sox", 6302, 5623, 1509, 55, 569, 208, 876, 829, 125, 0.268, 0.339,
"Yankees", 6271, 5515, 1374, 62, 625, 267, 851, 821, 63, 0.249, 0.329,
"Indians", 6300, 5595, 1447, 80, 554, 216, 818, 786, 135, 0.259, 0.332,
"Astros", 6146, 5453, 1390, 61, 565, 205, 797, 763, 71, 0.255, 0.329
)
# Table used for player calculations (main table)
players <- tibble::tribble(
~Name, ~Team, ~G, ~PA, ~AB, ~H, ~HBP, ~BB, ~HR, ~R, ~RBI, ~SB, ~AVG, ~OBP,
"Mookie Betts", "Red Sox", 136, 614, 520, 180, 8, 81, 32, 129, 80, 30, 0.346, 0.438,
"Mike Trout", "Angels", 140, 608, 471, 147, 10, 122, 39, 101, 79, 24, 0.312, 0.46,
"J.D. Martinez", "Red Sox", 150, 649, 569, 188, 4, 69, 43, 111, 130, 6, 0.33, 0.402,
"Alex Bregman", "Astros", 157, 705, 594, 170, 12, 96, 31, 105, 103, 10, 0.286, 0.394,
"Jose Ramirez", "Indians", 157, 698, 578, 156, 8, 106, 39, 110, 105, 34, 0.27, 0.387
)
# Denominators needed for calculations
calc_tbl <- tibble::tribble(
~data_col, ~calc_denom,
"HR", 14.3,
"R", 19.6,
"RBI", 17.5,
"SB", 26.2,
"AVG", 0.0045,
"OBP", 0.0031
) %>%
spread(key = data_col, value = "calc_denom")
# Get league average of teams
lg_avg <- teams %>%
# Divide counting stats by 10 to get the averages for 10 batters
mutate_at(vars(PA:SB), funs(./10)) %>%
summarize_if(is.numeric, mean, na.rm=TRUE)
lg_avg
#> # A tibble: 1 x 11
#> PA AB H HBP BB HR R RBI SB AVG OBP
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 625. 555. 143. 6.68 57.3 22.5 83.1 79.5 8.58 0.257 0.331
# Calculate Values
value_tbl <- players %>%
mutate(calc_R = R / calc_tbl$R,
calc_HR = HR / calc_tbl$HR,
calc_RBI = RBI / calc_tbl$RBI,
calc_SB = SB / calc_tbl$SB,
calc_BA = (((lg_avg$H * 13 ) + H)/(AB + (lg_avg$AB * 13)) - lg_avg$AVG) / calc_tbl$AVG,
calc_Total = (calc_R + calc_HR + calc_RBI + calc_SB + calc_BA))
value_tbl
#> # A tibble: 5 x 20
#> Name Team G PA AB H HBP BB HR R RBI SB
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mookie… Red … 136 614 520 180 8 81 32 129 80 30
#> 2 Mike T… Ange… 140 608 471 147 10 122 39 101 79 24
#> 3 J.D. M… Red … 150 649 569 188 4 69 43 111 130 6
#> 4 Alex B… Astr… 157 705 594 170 12 96 31 105 103 10
#> 5 Jose R… Indi… 157 698 578 156 8 106 39 110 105 34
#> # ... with 8 more variables: AVG <dbl>, OBP <dbl>, calc_R <dbl>,
#> # calc_HR <dbl>, calc_RBI <dbl>, calc_SB <dbl>, calc_BA <dbl>,
#> # calc_Total <dbl>
Thanks in advance.