Recommendations for Summing over Different Variables

rwengstrom · March 25, 2020, 3:40am

Hi Everyone,

I am relatively new to working with large data sets, and was looking for some guidance on how to proceed. I don't have any code yet as I'm just working around the best way to do what I'm looking to do. I have an idea in my mind of what I want the code to accomplish, but I'm not sure the best way to go about actually doing it.

I'm including a snapshot of the table I'm using, so you can have a visual of the columns I'm working with. I'm not sure what else is necessary, but let me know if I can include anything else.

Basically, I am working with a dataset I created that has a list of values in the "Pitcher.ID" column. In the end, I want to end up summing the column "Weighted Pitch Count" for each unique Pitcher.ID number. I'd like to have a set of each unique "Pitcher.ID" value along with the total sum of the other column.

Thanks in advance!

FJCC · March 25, 2020, 3:48am

Let's say your data set is called DF. Then you can get the sum of Weighted.Pitch.Count for each Pitcher with

library(dplyr)
Stat <- DF %>% group_by(Pitcher) %>% 
    summarize(SumOfPitch = sum(Weighted.Pitch.Count))

system · April 15, 2020, 3:48am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.