Calculating the probability of a single die roll with replacement


#1

calculate the probability of not seeing a 6 on a single roll.

p_no6 <- sample(6,1,replace=TRUE)

am I on the right path, or do I need actual math ?


#2

I'm a huge fan of doing simulations instead of doing actual math. I've managed to build a career on it!

Yes, you are on the right path. You simulated a single roll of the die. The replace parameter doesn't really matter since you're only drawing once. But once you start drawing more than one, it's important.


#3

do I need the sample function ?


#4

As @jdlong had written before you are in the right path. Yes, you do need the sample function.
Think of the replace=TRUE argument as a restriction to be imposed when you sample more than one dice.
In this case you are assuming the dices are independent of each other.


#5

whether you need the sample function depends on what you want to do. If you want to simulate a single (or vector of) dice rolls, then sample seems like the most straightforward way I can think of. You can do a whole bunch of dice rolls (100 million in this example) as follows:

rolls <- sample(6,100000000,replace=TRUE)
not6 <- rolls[rolls < 6]
## percent not 6 on single roll
length(not6) / length(rolls)

#6

so just use the default param of replace = FALSE? p_no6 <- sample(6,size=1) still not what I need.


#7

right.. 5/6 is the probability that it wont be 6 on a single dice roll. how to use that in the sample function ?


#8

Definitely set replace to TRUE if you want to run a simulation, say of 100 rolls. The idea is that around 84 rolls should not be 6 (5/6 * 100):

set.seed(100)
rolls <- sample(6, 100, replace = TRUE)
t <- table(rolls)
no_6 <- t[names(t)!="6"]
sum(no_6)
((5/6) * 100) 

#9

this worked: p_no6 <- 1 - (1/6) the premise I was working under for the probability not seeing a six on a single roll was 5/6. 1/6 seems like the probability of seeing a six.

why is 1/6 the probability of not seeing a six instead of 5/6 ?


#10

I did not need any simulation, just the probability. is there a reason for this ?


#11

I guess the answer that if you just want the probability answer, you don't need to use the sample() function at all. It does become useful if, like me and @jdlong , you'd like to see the actual p = 1 - (5/6) formula in action via a quick R script :slight_smile:


#12

why is it 1/6 instead of 5/6 for the probability of not seeing a six ?


#13

Is not, the probability of not seen six in a die roll it is indeed 5/6


#14

so its possible to calculate the probability without a simulation ?


#15

In this case yes, because each roll will be independent from any previous rolls, and you have discrete outcomes.


#16

so, you would want simulations for non-discrete or dependent events ?


#17

In R, I typically see simulated numbers created to build toy examples, or test out code. But for those cases, other functions, such as the runif() or rnorm() functions are usually selected to create the simulated data:

The rnorm() function will give you a set of data with a normal distribution, making it really good to learn about probabilities of continuous variables:

set.seed(100)
hist(rnorm(100))

Ideally, one starts practicing calculating probabilities on non-simulated data sets, such as iris or mtcars


#18

I see you were using simulation with replace=TRUE. I would think the default param would be better suited for simulation to get the probability


#19

If you want more simulation than the number of possible values, you need to set replace to TRUE. In the example, I requested 100 random values, so the sample() function would have failed if I would have left it intact.