I am having undesired results with a dataset. I have separated the dataset (based on ID) into 2 dput files to allow for clarity. Here is the first dput
structure(list(ID = c("56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789"
), Book = c("Book_A", "Book_A", "Book_B", "Book_B", "Book_C", "Book_C", "Book_D", "Book_D", "Book_E", "Book_E", "Book_F", "Book_F"), Home = c("San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres"),
Away = c("Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners"
), Team = c("San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners"
), Price = c(133, -162, 125, -155, 130, -160, 130, -150, 130, -150, 130, -155), Points = c(-1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))
and here is the second dput
structure(list(ID = c("12345", "12345",
"12345", "12345",
"12345", "12345",
"12345", "12345",
"12345", "12345"
), Book = c("Book_A", "Book_A", "Book_B",
"Book_B", "Book_C", "Book_C", "Book_D", "Book_D", "Book_E",
"Book_E"), Home = c("Cincinnati Reds", "Cincinnati Reds",
"Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds",
"Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds"
), Away = c("Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians",
"Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians"), Team = c("Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians"), Price = c(-175, 143, 160, -190, 140, -165, 145, -170, 150, -178), Points = c(1.5, -1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
I have a function that creates a final column (Value) with values that are calculated using the Price column. Here is the result of the first dput with the calculation run
# A tibble: 2 × 8
# Groups: ID [1]
ID Book Home Away Team Price Points Value
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 56789 Book_A San Diego Padres Seattle Mariners San Diego Padres 133 -1.5 0.028
2 56789 Book_D San Diego Padres Seattle Mariners Seattle Mariners -150 1.5 0.028
and here is the second.
# A tibble: 2 × 8
# Groups: ID [1]
ID Book Home Away Team Price Points Value
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 12345 Book_B Cincinnati Reds Cleveland Guardians Cincinnati Reds 160 -1.5 -0.256
2 12345 Book_A Cincinnati Reds Cleveland Guardians Cleveland Guardians 143 -1.5 -0.256
Here is the syntax that I am currently using
library(bettoR)
df %>%
group_by(ID, Team) %>%
slice_max(Price, with_ties = FALSE) %>%
group_by(ID) %>%
mutate(Value = hold_calc(Price[1], Price[2]))
The output of the first dput, with calculation ran, is valid. The output of the second dput, with calculation ran, should return four rows but only returns two. The issue appears to be with the fact that each Team in the second dput has both a positive AND negative Points value whereas the first dput each Team has a single positive OR negative Points value. If you know about sports betting then you will understand that each wager on a Points spread will have both a positive and negative value.
Any idea how to get this to work? Again, bear in mind that grouping is important due to the fact that multiple ID's will exist within a single dataset. This is the desired output of dput2
# A tibble: 4 × 8
# Groups: ID [1]
ID Book Home Away Team Price Points Value
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 12345 Book_B Cincinnati Reds Cleveland Guardians Cincinnati Reds 160 -1.5 0.007
2 12345 Book_A Cincinnati Reds Cleveland Guardians Cleveland Guardians 143 -1.5 0.046
3 12345 Book_A Cincinnati Reds Cleveland Guardians Cincinnati Reds -175 1.5 0.046
4 12345 Book_C Cincinnati Reds Cleveland Guardians Cleveland Guardians -165 1.5 0.007
I have posted in Stackoverflow but several days have elapsed with no further guidance. The dput referenced in the Stackoverflow post is stale, hence the difference. The link is here