How do I extract values from a column based upon positive or negative values in another column?

I am having undesired results with a dataset. I have separated the dataset (based on ID) into 2 dput files to allow for clarity. Here is the first dput

structure(list(ID = c("56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789"
), Book = c("Book_A", "Book_A", "Book_B", "Book_B", "Book_C", "Book_C", "Book_D", "Book_D", "Book_E", "Book_E", "Book_F", "Book_F"), Home = c("San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres"), 
Away = c("Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners"
), Team = c("San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners"
), Price = c(133, -162, 125, -155, 130, -160, 130, -150, 130, -150, 130, -155), Points = c(-1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))

and here is the second dput

structure(list(ID = c("12345", "12345", 
"12345", "12345", 
"12345", "12345", 
"12345", "12345", 
"12345", "12345"
), Book = c("Book_A", "Book_A", "Book_B", 
"Book_B", "Book_C", "Book_C", "Book_D", "Book_D", "Book_E", 
"Book_E"), Home = c("Cincinnati Reds", "Cincinnati Reds", 
"Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds", 
"Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds"
), Away = c("Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", 
"Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians"), Team = c("Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians"), Price = c(-175, 143, 160, -190, 140, -165, 145, -170, 150, -178), Points = c(1.5, -1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

I have a function that creates a final column (Value) with values that are calculated using the Price column. Here is the result of the first dput with the calculation run

# A tibble: 2 × 8
# Groups:   ID [1]
  ID                               Book              Home             Away             Team             Price Points Value
  <chr>                            <chr>             <chr>            <chr>            <chr>            <dbl>  <dbl> <dbl>
1 56789  Book_A   San Diego Padres Seattle Mariners San Diego Padres   133   -1.5    0.028
2 56789  Book_D   San Diego Padres Seattle Mariners Seattle Mariners  -150    1.5    0.028

and here is the second.

# A tibble: 2 × 8
# Groups:   ID [1]
  ID                               Book            Home            Away                Team                Price Points  Value
  <chr>                            <chr>          <chr>           <chr>               <chr>               <dbl>  <dbl>  <dbl>
1 12345  Book_B     Cincinnati Reds Cleveland Guardians Cincinnati Reds       160   -1.5 -0.256
2 12345  Book_A     Cincinnati Reds Cleveland Guardians Cleveland Guardians   143   -1.5 -0.256

Here is the syntax that I am currently using

library(bettoR)

df %>%
group_by(ID, Team) %>% 
    slice_max(Price, with_ties = FALSE) %>% 
    group_by(ID) %>% 
    mutate(Value = hold_calc(Price[1], Price[2]))

The output of the first dput, with calculation ran, is valid. The output of the second dput, with calculation ran, should return four rows but only returns two. The issue appears to be with the fact that each Team in the second dput has both a positive AND negative Points value whereas the first dput each Team has a single positive OR negative Points value. If you know about sports betting then you will understand that each wager on a Points spread will have both a positive and negative value.

Any idea how to get this to work? Again, bear in mind that grouping is important due to the fact that multiple ID's will exist within a single dataset. This is the desired output of dput2

# A tibble: 4 × 8
# Groups:   ID [1]
  ID    Book   Home            Away                Team                Price Points  Value
  <chr> <chr>  <chr>           <chr>               <chr>               <dbl>  <dbl>  <dbl>
1 12345 Book_B Cincinnati Reds Cleveland Guardians Cincinnati Reds       160   -1.5  0.007
2 12345 Book_A Cincinnati Reds Cleveland Guardians Cleveland Guardians   143   -1.5  0.046
3 12345 Book_A Cincinnati Reds Cleveland Guardians Cincinnati Reds      -175    1.5  0.046
4 12345 Book_C Cincinnati Reds Cleveland Guardians Cleveland Guardians  -165    1.5  0.007

I have posted in Stackoverflow but several days have elapsed with no further guidance. The dput referenced in the Stackoverflow post is stale, hence the difference. The link is here

is not accurate. Using a reprex. See the FAQ catches problems like this. There must have been something else going on in the session in which you produced the output shown.

library(dplyr)

d1 <- data.frame(
  ID = c("56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789", "56789"), Book = c("Book_A", "Book_A", "Book_B", "Book_B", "Book_C", "Book_C", "Book_D", "Book_D", "Book_E", "Book_E", "Book_F", "Book_F"), Home = c("San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres", "San Diego Padres"),
  Away = c("Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners", "Seattle Mariners"), Team = c("San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners", "San Diego Padres", "Seattle Mariners"), Price = c(133, -162, 125, -155, 130, -160, 130, -150, 130, -150, 130, -155), Points = c(-1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5)
)

d2 <- data.frame(ID = c(
  "12345", "12345",
  "12345", "12345",
  "12345", "12345",
  "12345", "12345",
  "12345", "12345"
), Book = c(
  "Book_A", "Book_A", "Book_B",
  "Book_B", "Book_C", "Book_C", "Book_D", "Book_D", "Book_E",
  "Book_E"
), Home = c(
  "Cincinnati Reds", "Cincinnati Reds",
  "Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds",
  "Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds", "Cincinnati Reds"
), Away = c(
  "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians",
  "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians", "Cleveland Guardians"
), Team = c("Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians", "Cincinnati Reds", "Cleveland Guardians"), Price = c(-175, 143, 160, -190, 140, -165, 145, -170, 150, -178), Points = c(1.5, -1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5, -1.5, 1.5))


d1 %>%
  group_by(ID, Team) %>% 
  slice_max(Price, with_ties = FALSE) %>% 
  group_by(ID) %>% 
  mutate(Value = function(Price[1], Price[2]))
# yields
#Error: unexpected '[' in:
#"  group_by(ID) %>% 
# mutate(Value = function(Price["
d2 %>%
  group_by(ID, Team) %>% 
  slice_max(Price, with_ties = FALSE) %>% 
  group_by(ID) %>% 
  mutate(Value = function(Price[1], Price[2]))
# same error

Combined with the lack of a clear description of of Value is derived, no path to a solution is apparent.

Hi Technocrat and thank you very much for your assistance. The error you are getting is with the function as you have clearly called out. My apologies for not including it. With that being said, here is the library and corrected call to hold_calc. I have edited the main post.

library(bettoR)

df %>%
        group_by(ID, Team) %>% 
        slice_max(Price, with_ties = FALSE) %>% 
        group_by(ID) %>% 
        mutate(Value = hold_calc(Price[1], Price[2]))

Here is a link to the github code if you would like to take a look.

Basically, the function calculates a value based on the Price column. When I run it against the first dput the calculations are correct. When I run it against the second dput the calculations aren't correct. The issue appears to be with the fact that the second dput has both a positive and negative Points value. In my main post I said Prices but I meant Points. This has been edited in the main post. It appears as if the df needs to be grouped further, based on the Points column. If you know about sports betting then you will understand that each wager on a Points spread will have both a positive and negative value. The final output of dput should look like this

# A tibble: 4 × 8
# Groups:   ID [1]
  ID    Book   Home            Away                Team                Price Points  Value
  <chr> <chr>  <chr>           <chr>               <chr>               <dbl>  <dbl>  <dbl>
1 12345 Book_B Cincinnati Reds Cleveland Guardians Cincinnati Reds       160   -1.5  0.007
2 12345 Book_A Cincinnati Reds Cleveland Guardians Cleveland Guardians   143   -1.5  0.046
3 12345 Book_A Cincinnati Reds Cleveland Guardians Cincinnati Reds      -175    1.5  0.046
4 12345 Book_C Cincinnati Reds Cleveland Guardians Cleveland Guardians  -165    1.5  0.007

The mix of positive and negative value isn't to blame

# copied from bettoR
implied_prob <- function(odds, type = "us") {
  ## Error Handling
  if (!is.numeric(odds)) {
    stop("Odds must be numeric")
  }
  if (!type %in% c("us", "frac", "dec")) {
    stop("type must be either: ('us', 'dec', or 'frac')")
  }
  if (type == "us") {
    imp_prob <- odds
    imp_prob[] <- NA_real_
    imp_prob[which(odds <= 0)] <-
      1 / (1 - 100 / odds[which(odds <= -100)])
    imp_prob[which(odds > 0)] <-
      1 / (1 + odds[which(odds >= 100)] / 100)
    imp_prob
  }
  if (type == "dec") {
    imp_prob <- odds
    imp_prob[] <- NA_real_
    imp_prob[which(odds > 1)] <- 1 / odds[which(odds > 1)]
    imp_prob
  }
  if (type == "frac") {
    odds <- odds + 1
    imp_prob <- odds
    imp_prob[] <- NA_real_
    imp_prob[which(odds > 1)] <- 1 / odds[which(odds > 1)]
    imp_prob
  }
  return(imp_prob)
}

# example shows that mixed sign input is processed

implied_prob(c(360, -500, -110, 140))
#> [1] 0.2173913 0.8333333 0.5238095 0.4166667

# lines is a vector

lines <- c(360, -500, -110, 140)

# and is used like this in hold_calc
sapply(lines, implied_prob)
#> [1] 0.2173913 0.8333333 0.5238095 0.4166667

With sports wagering, there are 2 sides ONLY to each wager. One side is a positive value and the other side negative. Taking dput1 for example we can see that the San Diego Padres are -1.5 Points and the Seattle Mariners are 1.5 Points. Each side has a positive OR negative number and this is what is expected. We can then take the 2 Price values that are associated and plug them into the hold_calc function.

hold_calc(133, -150)

which returns the value 0.0283

Taking dput2 we can see a similar scenario, however each Team has both a positive AND a negative Point value. If I divide the dput2 output further, I think it will become clearer

# A tibble: 2 × 7
     ID Book   Home            Away                Team                Price Points
  <dbl> <chr>  <chr>           <chr>               <chr>               <dbl>  <dbl>
1 12345 Book_A Cincinnati Reds Cleveland Guardians Cincinnati Reds      -175    1.5
2 12345 Book_A Cincinnati Reds Cleveland Guardians Cleveland Guardians   143   -1.5
# A tibble: 8 × 7
     ID Book   Home            Away                Team                Price Points
  <dbl> <chr>  <chr>           <chr>               <chr>               <dbl>  <dbl>
1 12345 Book_B Cincinnati Reds Cleveland Guardians Cincinnati Reds       160   -1.5
2 12345 Book_B Cincinnati Reds Cleveland Guardians Cleveland Guardians  -190    1.5
3 12345 Book_C Cincinnati Reds Cleveland Guardians Cincinnati Reds       140   -1.5
4 12345 Book_C Cincinnati Reds Cleveland Guardians Cleveland Guardians  -165    1.5
5 12345 Book_D Cincinnati Reds Cleveland Guardians Cincinnati Reds       145   -1.5
6 12345 Book_D Cincinnati Reds Cleveland Guardians Cleveland Guardians  -170    1.5
7 12345 Book_E Cincinnati Reds Cleveland Guardians Cincinnati Reds       150   -1.5
8 12345 Book_E Cincinnati Reds Cleveland Guardians Cleveland Guardians  -178    1.5

Now you can see that each Team has a positive AND a negative Points value. If I take the associated Prices and plug them into the hold_calc I get the following

hold_calc(-174,143)

which returns the value of 0.0445

hold_calc(160,-165)

which returns the value of 0.0072

The hold_calc requires the 2 largest Price values and that is why I am using the slice_max syntax. If I use slice_max against the current format of dput2, I get two values which are both negative and this is won't calculate correctly.

  ID    Book   Home            Away                Team                Price Points  Value
  <chr> <chr>  <chr>           <chr>               <chr>               <dbl>  <dbl>  <dbl>
1 12345 Book_B Cincinnati Reds Cleveland Guardians Cincinnati Reds       160   -1.5 -0.256
2 12345 Book_A Cincinnati Reds Cleveland Guardians Cleveland Guardians   143   -1.5 -0.256

So I believe the issue ultimately stems from the fact that each Team has both a positive AND negative Points value within dput2.

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.