Calculate weighted variance by year

I'm trying to learn R to play around with various baseball-related databases. I struggle with R questions because my technical background is not data science or data wrangling, so I don't know the vocabulary, so apologies up front.

I have a data table that consists of 4 columns: playerID, yearID, AtBats, BattingAverage. There are a bunch of rows, as there is player data for each season in Major League Baseball History. Of course, BattingAverage was calculated as Hits/AtBats, but Hits aren't in my data table.

I want to construct a table of the annual weighted mean and weighted standard deviation of BattingAverage. Now, I know how to get a table of the AtBat-weighted BattingAverage:

summary <- PlayerStats %>%
group_by(yearID) %>%
summarize(y.BattingAverage = weighted.mean(BattingAverage,AtBats))

the way that I got the annual weighted variance was to first left_join the PlayerStats table with the Summary table, and then compute a different weighted.mean

PlayerStats <- Player_stats %>% left_join(Summary)

Summary <- PlayerStats %>%
group_by(year_ID) %>%
summarize(var.BattingAvg = weighted.mean((BattingAvg - y.BattingAverage)^2, AB)

And this works, but it sure seems to me that I'm doing extra steps. On top of that, my summary table lost the annual weighted batting average column, and now only contains the annual weighted variance.

There are functions for the weighted variance in various packages. Hmisc has wtd.var. You could calculate the weighted mean and weighted variance in a single summarize().

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.