Sum based on variable level with dplyr


#1

Hello,

I am a novice R user and currently experimenting with some different functions in RStudio. I have a dataset (matchdb2017) with 13 variables. One variable, called Period.Name (factor), has two levels. These are ‘Quarter 1’ and ‘Quarter 2’. I’m trying to use dplyr to sum the values for Quarter 1 and Quarter 2 for each of the three numeric variables in my dataset. I want to group this by Player.Name, Session.Type and variable.

This is my code so far. I have not melted my dataset or reshaped to long format. Do I need to do this?

summary <- matchdb2017 %>%.
group_by(Player.Name, Session.Type, variable) %>%

I’m stuck with the next layer of code to sum Quarter 1 and Quarter 2 together, both levels of Period.Name. If someone has any suggestions, it would be very much appreciated.

Thank you.


#2

Can you post couple of rows of your data? I think, it’ll be clearer in explaining what you are trying to achieve. Also, try to make it minimal, e.g., you don’t need to include all 13 variables if only 5 of them are needed.

Overall, I think, you are on the right track.


#3

Player.Name Period.Name Average.Distance Total.HIR Velocity.Band.6 Date Session.Type
Player 1 Quarter 1 2948.245 1245.07 99.380000 1/4/17 PM 1
Player 2 Quarter 1 3537.611 1230.86 19.689990 1/4/17 PM 1
Player 1 Quarter 2 3272.815 1039.59 14.940000 1/4/17 PM 1
Player 2 Quarter 2 3612.610 1076.92 12.190000 1/4/17 PM 1

Basically, I want to sum Quarter 1 and Quarter 2 together for each of Average.Distance, Total.HIR and Velocity.Band.6 variables and for each Session.Type (PM 1). Obviously, this is only a few rows as an example, and there are many more levels to Session.Type.

Hope this helps. Thanks.


#4

What we really need is a reprex that builds the table you are interested in using.

What does a date like 1/14/17 PM mean?

Here is a reprex that builds a table

suppressPackageStartupMessages(library(tidyverse))
mdb <- tribble(
         ~Player.Name, ~Period.Name, ~Average.Distance, ~Total.HIR, ~Velocity.Band.6, ~Date, ~Session.Type,
         "Player 1", "Quarter 1", 2948.245, 1245.07, 99.380000, "1/4/17 PM", 1,
         "Player 2", "Quarter 1", 3537.611, 1230.86, 19.689990, "1/4/17 PM", 1,
         "Player 1", "Quarter 2", 3272.815, 1039.59, 14.940000, "1/4/17 PM", 1,
         "Player 2", "Quarter 2", 3612.610, 1076.92, 12.190000, "1/4/17 PM",  1)

mdb
#> # A tibble: 4 x 7
#>   Player.Name Period.Name Average.Distance Total.HIR Velocity.Band.6 Date 
#>   <chr>       <chr>                  <dbl>     <dbl>           <dbl> <chr>
#> 1 Player 1    Quarter 1              2948.     1245.            99.4 1/4/…
#> 2 Player 2    Quarter 1              3538.     1231.            19.7 1/4/…
#> 3 Player 1    Quarter 2              3273.     1040.            14.9 1/4/…
#> 4 Player 2    Quarter 2              3613.     1077.            12.2 1/4/…
#> # ... with 1 more variable: Session.Type <dbl>

Created on 2018-03-07 by the reprex package (v0.2.0).

More info about reprex is here https://www.jessemaegan.com/post/so-you-ve-been-asked-to-make-a-reprex/

Also at present there is an issue with the reprex in CRAN so you should use the one on github.

Until CRAN catches up with the latest version install reprex with

devtools::install_github(“tidyverse/reprex”)

Keep in mind that just about everyone here is answering questions in their spare time so it is appreciated if you do as much as you can to make it easy as possible for people to help you.