Counting Sequences of Coin Flips

I simulated this dataset which contains 1000 coin flips - then I calculated the number of "2 Flip Sequences":

Coin <- c('H', 'T')
Results = sample(Coin,1000, replace = TRUE)
My_Data = data.frame(id = 1:1000, Results)



Pairs = data.frame(first = head(My_Data$Results, -1), second = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second Freq
1     H      H  255
2     T      H  245
3     H      T  246
4     T      T  253

I am curious - is it possible to extend the above code for "3 Flip Sequences"?

For example - I tried modifying parts of the code to see how the results change (and hoped to stumble across the correct way to write this code):

# First Attempt
Pairs = data.frame(first = head(My_Data$Results, -1), second  = head(My_Data$Results, -1) , third = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second third Freq
1     H      H     H  255
2     T      H     H  245
3     H      T     H    0
4     T      T     H    0
5     H      H     T    0
6     T      H     T    0
7     H      T     T  246
8     T      T     T  253

# Second Attempt
Pairs = data.frame(first = head(My_Data$Results, -1), second  = tail(My_Data$Results, -1) , third = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second third Freq
1     H      H     H  255
2     T      H     H    0
3     H      T     H    0
4     T      T     H  245
5     H      H     T  246
6     T      H     T    0
7     H      T     T    0
8     T      T     T  253

I am not sure which of these options are correct?

  • In general, I am looking to understand the logic as to how I can adapt the above code for an "arbitrary number of coin flips" (e.g. "4 flip sequences", "5 flip sequences", etc.)
  • Also, this might not be the most efficient way to calculate these frequencies - I would also be interested in learning about other ways that might be more efficient ( e.g. as the overall size of the data increases).

Thanks!

Similar to the question on exam scores

# create two vectors of scores
set.seed(42)
scores_1 <- sample(c(1,0,NA),100,replace = TRUE)
set.seed(222)
scores_2 <- sample(c(1,0,NA),100,replace = TRUE)

# remove NAs
scores_1 <- scores_1[which(!is.na(scores_1))]
scores_2 <- scores_2[which(!is.na(scores_2))]

# trim to a multiple of 3
length(scores_1) %% 3
#> [1] 1
length(scores_2) %% 3
#> [1] 1
scores_1 <- scores_1[-length(scores_1)]
scores_2 <- scores_2[-length(scores_2)]

scores_1
#>  [1] 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0
#> [39] 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 1 1 1
#> [77] 0 1
scores_2
#>  [1] 1 0 0 0 0 1 0 1 0 1 0 0 1 1 0 1 1 1 1 0 1 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0
#> [39] 0 1 0 0 1 1 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 1 0 0 0 1

# find sequences

seq_1 <- rle(scores_1)
seq_2 <- rle(scores_2)
seq_1
#> Run Length Encoding
#>   lengths: int [1:31] 4 3 3 3 5 2 1 6 2 2 ...
#>   values : num [1:31] 1 0 1 0 1 0 1 0 1 0 ...
seq_2
#> Run Length Encoding
#>   lengths: int [1:33] 1 4 1 1 1 1 1 2 2 1 ...
#>   values : num [1:33] 1 0 1 0 1 0 1 0 1 0 ...

# tally results 
success_1 <- length(which(seq_1$value == 0 & seq_1$lengths == 2))
failure_1 <- length(which(seq_1$value == 0 & seq_1$lengths == 3))

success_2 <- length(which(seq_2$value == 0 & seq_2$lengths == 2))
failure_2 <- length(which(seq_2$value == 0 & seq_2$lengths == 3))

# count trials
trial_1 <- success_1 + failure_1
trial_2 <- success_2 + failure_2

# failure proportion
outcomes_1 <- failure_1 / trial_1
outcomes_2 <- failure_2 / trial_2
outcomes_1
#> [1] 0.4285714
outcomes_2
#> [1] 0.5

Created on 2022-12-10 by the reprex package (v2.0.1)

Here is quite a direct approach; slider package is similar to purrr(map) but designed to run over timeseries, or over vectors and data.frames in a rowwise manner, and has nice features for windowed inputs; the window features are useful to you in arrange the coins.


(Results <- sample(c('H', 'T'),1000, replace = TRUE))


library(slider)

my_seq_length <- 3L

grouped_seqs <- slide(
  .x = Results,
  .f = function(x)paste0(x,collapse=""),
  .before= my_seq_length-1L,
  .complete=TRUE
)

as.data.frame(table(unlist(grouped_seqs)))

You can change .complete = TRUE to FALSE to see the partial (incomplete sequences, i.e. the first two coin tosses don't form a triad, they only do after the 3rd toss

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.