 # How to find the most probable sequence combination?

Hi,
how to find the most probable sequence combination?
In other words, how to find the probability that I will buy first cherries, then apple,
then banana, then pear given the fact that I initially bought bananas?

fruits <- c("apples", "pears","bananas", "cherries")
set.seed(1234)
df_fruits <- data.frame(fruits=sample(fruits,100, replace = T))
table(previuosly_bought = df_fruits$fruits[-length(df_fruits$fruits)], bought_next = df_fruitsfruits[-1])  I have counted the occurances of the fruits after every fruit. 7 times after I bought an apple I buy again an apple 8 times after I bought an apple I buy a banana 9 times after I bought an apple I buy cherries 7 times after I bought an apple I buy a pear Thank you in advance for your time! This sound like Association Rules Mining, take a look to this package 1 Like May be something like this? \begin{align} & \ \mathbb{P}(cherry \rightarrow apple \rightarrow banana \rightarrow pear \mid initially \ banana) \\ = & \ \mathbb{P}(pear \mid cherry \rightarrow apple \rightarrow banana, initially \ banana) \times \\ & \ \mathbb{P}(banana \mid cherry \rightarrow apple, initially \ banana) \times \\ & \ \mathbb {P} (apple \mid cherry, initially \ banana) \times \\ & \ \mathbb {P} (cherry \mid initially \ banana) \\ = & \ \mathbb {P} (pear \mid banana) \times \\ & \ \mathbb {P} (banana \mid apple) \times \\ & \ \mathbb {P} (apple \mid cherry) \times \\ & \ \mathbb {P} (cherry \mid initially \ banana) \\ = & \ \frac {8} {6 + 8 + 3 + 8} \times \\ & \ \frac {8} {7 + 8 + 9 + 7} \times \\ & \ \frac {9} {9 + 3 + 3 + 4} \times \\ & \ \frac {3} {6 + 8 + 3 + 8} \\ = & \frac {1728} {368125} \end{align} 1 Like In case it helps, here's an implementation. I used a different seed, because probably we have different R versions and hence I get different results from yours using 1234. If you find an elegant solution using the package Andres suggested, can I request you to share that? set.seed(seed = 33734) dataset <- data.frame(fruits = sample(x = c("apple", "banana", "cherry", "pear"), size = 100, replace = TRUE)) (occurrence_matrix <- with(data = dataset, expr = table(fruits[-length(x = fruits)], fruits[-1]))) #> apple banana cherry pear #> apple 8 6 5 8 #> banana 8 5 4 7 #> cherry 4 4 1 9 #> pear 7 9 8 6 (transition_probability_matrix <- (occurrence_matrix / rowSums(x = occurrence_matrix))) #> apple banana cherry pear #> apple 0.29629630 0.22222222 0.18518519 0.29629630 #> banana 0.33333333 0.20833333 0.16666667 0.29166667 #> cherry 0.22222222 0.22222222 0.05555556 0.50000000 #> pear 0.23333333 0.30000000 0.26666667 0.20000000 desired_combination <- c("banana", "cherry", "apple", "banana", "pear") desired_positions <- data.frame(from = desired_combination[-length(x = desired_combination)], to = desired_combination[-1]) required_probabilities <- apply(X = desired_positions, MARGIN = 1, FUN = function(t) transition_probability_matrix[t, t]) (final_answer <- prod(required_probabilities)) #>  0.002400549  Created on 2019-06-23 by the reprex package (v0.3.0) 1 Like Yarnabrina, thank you for your solution and implementation! Yes, of course, I will post. I need to read more first and test to see how to use the info from Andresrcs' link and see what works for me. I want to give Andresrcs 2,3 more hearts for the links, these packages are amazing! Yarnabrina, thank you as well for implementing your solution in r. I found something that worked for me, not that much for the example I gave, I had to modify it in order to implement a solution with the arules package. I added one variable, I called it customer, so I can treat the fruits like a market basket analysis, so, I have 10 customers and everyone bought different fruits. fruits <- c("apples", "pears","bananas", "cherries") customer <- rep(c(1:10), each = 3) set.seed(1233) df_fruits <- data.frame(customer = sample(customer, 100, replace = T), fruits = sample(fruits,100, replace = T, prob=c(0.29,0.60,0.5,0.1))) # order the numeric variable df_fruits <- df_fruits[order(df_fruitscustomer),]

library(arules)
# create transactioanl data
trans <- as(split(df_fruits[,"fruits"], df_fruits[,"customer"]), "transactions")
inspect(trans)

# apply apriori algorithm
rule <- apriori(trans, parameter = list(supp = 0.01, conf = 0.8,minlen=2))

summary(rule)