Confused about case_when

CianStryker · October 20, 2019, 4:10pm

So I'm having some trouble with this question. I conceptually understand what I'm trying to do, but I'm not sure how to make it happen. Essentially, I want to create a tibble that flips a coin 6 times and returns how many times it flipped heads. The twist is that the coin may or may not be fake and this is represented by 5 probabilities for heads (.00, .25, .50, .75, 1). Also I have priors for each of these probabilities.

To tackle this question I created a tibble that samples from my five probabilities with the priors written in the prob section. Then I try and use mutate to create a number_heads column that should show how many heads were flipped that also correspond to my p column. I try and use case_when to then create five different samples that match the probabilities from earlier. I'm not getting anywhere with this approach though.

I know this strategy worked for me when I only had to probabilities and I used an if_else command. But now that I have 5, I'm hitting a wall. Hopefully this makes sense.

Thanks in advance.

Q2 <- tibble(replicate = 1:1000) %>%
  mutate(p = sample(c("0.00", "0.25", ".50", ".75", "1.00"), size = 1000, replace = TRUE, c(.25, .05, .4, .05, .25))) %>%
  mutate(
    number_heads = case_when(
      
      sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(1, 0, 0, 0, 0, 0, 0)) ~ "0.00",
    
      sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(0.1779785, 0.355957, 0.2966309, 0.1318359, 0.03295898, 0.004394531, 0.0002441406)) ~ "0.25",
    
      sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(0.015625, 0.09375, 0.234375, 0.3125, 0.234375, 0.09375, 0.015625)) ~ "0.50",

      sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(0.0002441406, 0.004394531, 0.03295898, 0.1318359, 0.2966309, 0.355957, 0.1779785)) ~ "0.75", 

      sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(0, 0, 0, 0, 0, 0, 1)) ~ "1.00",
      
      TRUE ~ NA    
      ))

andresrcs · October 20, 2019, 4:38pm

Please do not post screenshots, they are not useful, post formatted code instead, here is how to do it.

Ideally, you should ask your question providing a proper reproducible example, like explained in this guide

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

CianStryker · October 20, 2019, 4:45pm

My bad. Sorry about that. I added the formatted code instead. I don't really have a dataset to add in though. I do have my calculations for the probabilities, but that code is kinda a mess so I didn't want to junk up my post.

Edit: I removed the variables and replaced them with the actual values. I have nothing else on my end that isn't the code I posted now.

andresrcs · October 20, 2019, 9:43pm

Check the documentation for case_when(), the left-hand side should evaluate to a logical value

A sequence of two-sided formulas. The left hand side (LHS) determines which values match this case. The right hand side (RHS) provides the replacement value.

The LHS must evaluate to a logical vector. The RHS does not need to be logical, but all > RHSs must evaluate to the same type of vector.
....

So you have to do something like this

library(dplyr)

tibble(replicate = 1:1000) %>%
    mutate(p = sample(c("0.00", "0.25", "0.50", "0.75", "1.00"), size = 1000, replace = TRUE, c(.25, .05, .4, .05, .25)),
           number_heads = case_when(
               p == "0.00" ~ sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(1, 0, 0, 0, 0, 0, 0)),
               p == "0.25" ~ sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(0.1779785, 0.355957, 0.2966309, 0.1318359, 0.03295898, 0.004394531, 0.0002441406)),
               p == "0.50" ~ sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(0.015625, 0.09375, 0.234375, 0.3125, 0.234375, 0.09375, 0.015625)),
               p == "0.75" ~ sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(0.0002441406, 0.004394531, 0.03295898, 0.1318359, 0.2966309, 0.355957, 0.1779785)), 
               p == "1.00" ~ sample(c(0, 1, 2, 3, 4, 5, 6), size = 1000, replace = TRUE, prob = c(0, 0, 0, 0, 0, 0, 1)),
               TRUE ~ NA_real_
           ))
#> # A tibble: 1,000 x 3
#>    replicate p     number_heads
#>        <int> <chr>        <dbl>
#>  1         1 0.50             2
#>  2         2 0.50             3
#>  3         3 0.00             0
#>  4         4 1.00             6
#>  5         5 0.50             2
#>  6         6 1.00             6
#>  7         7 0.50             3
#>  8         8 0.00             0
#>  9         9 0.50             4
#> 10        10 0.75             5
#> # … with 990 more rows

^{Created on 2019-10-20 by the reprex package (v0.3.0.9000)}

CianStryker · October 20, 2019, 10:05pm

Thank you so much! This has been killing me and I suspected that I was making a simple mistake with regards to case_when.

system · October 27, 2019, 10:05pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.