Counting the Number of Times a Condition Appears in a Data Frame

I am working with the R programming language. I thought of the following question I would like to test:

  • Suppose there is a coin that has a 5% chance of landing on HEADS and a 95% chance of landing on tails

Based on a computer simulation, I want to find out the following :

  • The (average) minimum number of flips before observing HEADS, TAILS, HEADS
  • The (average) average number of flips before observing HEADS, TAILS, HEADS
  • The (average) maximum number of flips before observing HEADS, TAILS, HEADS

I tried to write a simulation in R that flips this coin three times, I call this a "run". The simulation then performs 100 "runs":

results <- list()

for (i in 1:100){

response_i <- c("H","T")
response_i <- sample(response_i, 3, replace=TRUE, 
                        prob=c(0.05, 0.95))
response_i <- as.factor(response_i)

iteration_i = i

run_i = data.frame(response_i, iteration_i)

 results[[i]] <- run_i

}

This looks as follows (e.g. run #22, #23, #24):

[[22]]
  response_i iteration_i
1          T          22
2          T          22
3          T          22

[[23]]
  response_i iteration_i
1          H          23
2          T          23
3          T          23

[[24]]
  response_i iteration_i
1          T          24
2          T          24
3          T          24

My Question:

I would like to modify the above code so that:

1) The simulation automatically stops after you see the first H, T, H (right now, I run the simulation for 100 runs, and hope this is enough runs to observe at least one H, T, T)

2) Once the first H, T, H appears and the simulation automatically stops, I would like to record at which "run" this took place (i.e. what was the value of "iteration_i"? )

3) I would then like to repeat this entire simulation 100 times (100 runs * 100 times = 10,000 coin flips)

Once this is completed, I will be able to make histograms which show the (average) minimum number of coin flips, the (average) average number of coin flips and the (average) maximum number of coin flips before H, T, H was observed (using "ggplot2", I don't think this will be too difficult).

But can someone please help me format/modify my code to simplify what I am trying to achieve? At the moment, I used the following R code to "tidy up" the 100 "runs" from the first simulation:

results_df <- do.call(rbind.data.frame, results)

 head(results_df)
  response_i iteration_i
1          T           1
2          T           1
3          T           1
4          T           2
5          T           2
6          T           2

I then manually repeated this many times, e..g

results_df_1 <- do.call(rbind.data.frame, results)
results_df_1$index = 1

#re-run original simulation
results_df_2<- do.call(rbind.data.frame, results)
results_df_2$index = 2

#re-run original simulation (many times)
results_df_n<- do.call(rbind.data.frame, results)
results_df_n$index = n

final <- data.frame(results_df_1, results_df_2, results_df_n)

I then imported this "final" file into Microsoft Excel and tried to manually collapse the "final file" to answer my three original questions - but I was hoping that someone could show me how to do this by modifying my original code in R.

Can someone please help me with this?

Thanks!

# constants
outcomes <- c("H","T")
hit      <- "HTH"
# probability of "H" outcome
prob_h   <- 0.05
# number of coin flips
flips    <- 100
# number of trials
trials   <- 100

# functions
throw <- function(x,y,z) sample(x, y, replace=TRUE, prob=c(z, 1-z))
score   <- function(x) {
  runs   = rle(x)
  result = data.frame(outcome = runs$values,
                       streak  = runs$lengths,
                       throws  = cumsum(runs$lengths))
}

# example sequence of heads and tails
card <- score(throw(outcomes,flips,prob_h))
head(card)
#>   outcome streak throws
#> 1       T     23     23
#> 2       H      1     24
#> 3       T     24     48
#> 4       H      1     49
#> 5       T      4     53
#> 6       H      1     54

indices <- which(card$outcome == "H")
head(indices)
#> [1]  2  4  6  8 10 12
starts  <- card[indices,3]
head(starts)
#> [1] 24 49 54 75 80 97

# first element will be the flip at which the first HTH sequence began
starts[1]
#> [1] 24
# number of flips before first HTH sequence
starts[1] - 1
#> [1] 23


run_one <- function() {
  card = score(throw(outcomes,flips,prob_h))
  indices = which(card$outcome == "H")
  starts  = card[indices,3]
  before  = starts[1] -1
  return(before)
}

results <- vector(trials,mode = "integer")
for(i in 1:trials) results[i] = run_one()

results
#>   [1] 13 13  2 25 11 16  0  4 18 24  6 NA 19 11 48 33 35 49 22  3 38  6 57 22  3
#>  [26] 15  2 34 14  2 17  2  9 14 32 15 13  6  0 20 72 20  1 10  1  3 29  0 23  6
#>  [51] 16  2 15 13  0  4 14 23  6 12 37 13  9 29 48 23  8  5 12 11 19 12  5  7 13
#>  [76]  1  9 17 31 14  5  8 21  8  3  2  0 80 27  5  5 19  8 10  7  5  2 19 10 24
sum(is.na(results)) # no runs of HTH in 100 flips
#> [1] 1
quantile(results,na.rm = TRUE)
#>   0%  25%  50%  75% 100% 
#>  0.0  5.0 12.0 20.5 80.0
min(results,na.rm = TRUE)
#> [1] 0
mean(results,na.rm = TRUE)
#> [1] 15.49495
sd(results,na.rm = TRUE)
#> [1] 14.90677
median(results,na.rm = TRUE)
#> [1] 12
max(results,na.rm = TRUE)
#> [1] 80

# number of runs in which HTH was the first sequence
length(which(results == 0))
#> [1] 5
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.