I'm in the process of analyzing baseball team data in R. I have historical team stats going back a while. All of the stat categories make up the columns, with the season and teams residing in rows (data is broken up by year).
I'm looking to create 2 data.frames
from these stats.
- Summary of Stats to include: Weighted Average of each column based on a weighting of (.4 to 2018, .35 to 2017, and .25 to 2016), Most current 3 year average (in this case we are talking 2016-2018), and All Year Average
- Slope of the results of the
data.frame
in 1.
I've started my code, but got stuck. Here's what I have so far. I'm probably better off using summarize_at
so that I can skip over the team names, and season column.
library(tidyverse)
tibble::tribble(
~season, ~team, ~hr, ~sb, ~w, ~sv,
2015L, "Blue Jays", 232L, 88L, 93L, 34L,
2016L, "Red Sox", 208L, 83L, 93L, 43L,
2017L, "Astros", 238L, 98L, 101L, 45L,
2018L, "Athletics", 227L, 35L, 97L, 44L,
2018L, "Red Sox", 208L, 125L, 108L, 46L,
2018L, "Yankees", 267L, 63L, 100L, 49L,
2017L, "Yankees", 241L, 90L, 91L, 36L,
2018L, "Indians", 216L, 135L, 91L, 41L,
2017L, "Indians", 212L, 88L, 102L, 37L,
2018L, "Astros", 205L, 71L, 103L, 46L
)
#> # A tibble: 10 x 6
#> season team hr sb w sv
#> <int> <chr> <int> <int> <int> <int>
#> 1 2015 Blue Jays 232 88 93 34
#> 2 2016 Red Sox 208 83 93 43
#> 3 2017 Astros 238 98 101 45
#> 4 2018 Athletics 227 35 97 44
#> 5 2018 Red Sox 208 125 108 46
#> 6 2018 Yankees 267 63 100 49
#> 7 2017 Yankees 241 90 91 36
#> 8 2018 Indians 216 135 91 41
#> 9 2017 Indians 212 88 102 37
#> 10 2018 Astros 205 71 103 46
summary_table <- league_stats %>%
mutate(weight = case_when(
season == 2018 ~ .4,
season == 2017 ~ .35,
season == 2016 ~ .25,
TRUE ~ NA_real_
)) %>%
summarize_all(funs(mean))
The last step is that I'd like to use the results of the 2nd Table in a number table. For example, multipy the slope of column HR times HR for a projection. The new data.frame
would be projections data with the same columns.