Counting Sequences of Coin Flips

omario · December 11, 2022, 6:51am

I simulated this dataset which contains 1000 coin flips - then I calculated the number of "2 Flip Sequences":

Coin <- c('H', 'T')
Results = sample(Coin,1000, replace = TRUE)
My_Data = data.frame(id = 1:1000, Results)



Pairs = data.frame(first = head(My_Data$Results, -1), second = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second Freq
1     H      H  255
2     T      H  245
3     H      T  246
4     T      T  253

I am curious - is it possible to extend the above code for "3 Flip Sequences"?

For example - I tried modifying parts of the code to see how the results change (and hoped to stumble across the correct way to write this code):

# First Attempt
Pairs = data.frame(first = head(My_Data$Results, -1), second  = head(My_Data$Results, -1) , third = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second third Freq
1     H      H     H  255
2     T      H     H  245
3     H      T     H    0
4     T      T     H    0
5     H      H     T    0
6     T      H     T    0
7     H      T     T  246
8     T      T     T  253

# Second Attempt
Pairs = data.frame(first = head(My_Data$Results, -1), second  = tail(My_Data$Results, -1) , third = tail(My_Data$Results, -1))
Final = as.data.frame(table(Pairs))

  first second third Freq
1     H      H     H  255
2     T      H     H    0
3     H      T     H    0
4     T      T     H  245
5     H      H     T  246
6     T      H     T    0
7     H      T     T    0
8     T      T     T  253

I am not sure which of these options are correct?

In general, I am looking to understand the logic as to how I can adapt the above code for an "arbitrary number of coin flips" (e.g. "4 flip sequences", "5 flip sequences", etc.)
Also, this might not be the most efficient way to calculate these frequencies - I would also be interested in learning about other ways that might be more efficient ( e.g. as the overall size of the data increases).

Thanks!

technocrat · December 11, 2022, 7:52am

Similar to the question on exam scores

# create two vectors of scores
set.seed(42)
scores_1 <- sample(c(1,0,NA),100,replace = TRUE)
set.seed(222)
scores_2 <- sample(c(1,0,NA),100,replace = TRUE)

# remove NAs
scores_1 <- scores_1[which(!is.na(scores_1))]
scores_2 <- scores_2[which(!is.na(scores_2))]

# trim to a multiple of 3
length(scores_1) %% 3
#> [1] 1
length(scores_2) %% 3
#> [1] 1
scores_1 <- scores_1[-length(scores_1)]
scores_2 <- scores_2[-length(scores_2)]

scores_1
#>  [1] 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0
#> [39] 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 1 1 1
#> [77] 0 1
scores_2
#>  [1] 1 0 0 0 0 1 0 1 0 1 0 0 1 1 0 1 1 1 1 0 1 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0
#> [39] 0 1 0 0 1 1 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 1 0 0 0 1

# find sequences

seq_1 <- rle(scores_1)
seq_2 <- rle(scores_2)
seq_1
#> Run Length Encoding
#>   lengths: int [1:31] 4 3 3 3 5 2 1 6 2 2 ...
#>   values : num [1:31] 1 0 1 0 1 0 1 0 1 0 ...
seq_2
#> Run Length Encoding
#>   lengths: int [1:33] 1 4 1 1 1 1 1 2 2 1 ...
#>   values : num [1:33] 1 0 1 0 1 0 1 0 1 0 ...

# tally results 
success_1 <- length(which(seq_1$value == 0 & seq_1$lengths == 2))
failure_1 <- length(which(seq_1$value == 0 & seq_1$lengths == 3))

success_2 <- length(which(seq_2$value == 0 & seq_2$lengths == 2))
failure_2 <- length(which(seq_2$value == 0 & seq_2$lengths == 3))

# count trials
trial_1 <- success_1 + failure_1
trial_2 <- success_2 + failure_2

# failure proportion
outcomes_1 <- failure_1 / trial_1
outcomes_2 <- failure_2 / trial_2
outcomes_1
#> [1] 0.4285714
outcomes_2
#> [1] 0.5

^{Created on 2022-12-10 by the reprex package (v2.0.1)}

nirgrahamuk · December 12, 2022, 11:38am

Here is quite a direct approach; slider package is similar to purrr(map) but designed to run over timeseries, or over vectors and data.frames in a rowwise manner, and has nice features for windowed inputs; the window features are useful to you in arrange the coins.


(Results <- sample(c('H', 'T'),1000, replace = TRUE))


library(slider)

my_seq_length <- 3L

grouped_seqs <- slide(
  .x = Results,
  .f = function(x)paste0(x,collapse=""),
  .before= my_seq_length-1L,
  .complete=TRUE
)

as.data.frame(table(unlist(grouped_seqs)))

You can change .complete = TRUE to FALSE to see the partial (incomplete sequences, i.e. the first two coin tosses don't form a triad, they only do after the 3rd toss

system · January 23, 2023, 11:38am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.