How to find the most probable sequence combination?

enieyi · June 23, 2019, 2:07pm

Hi,
how to find the most probable sequence combination?
In other words, how to find the probability that I will buy first cherries, then apple,
then banana, then pear given the fact that I initially bought bananas?

fruits <- c("apples", "pears","bananas", "cherries")
set.seed(1234)
df_fruits <- data.frame(fruits=sample(fruits,100, replace = T))
table(previuosly_bought = df_fruits$fruits[-length(df_fruits$fruits)], bought_next = df_fruits$fruits[-1])

I have counted the occurances of the fruits after every fruit.

7 times after I bought an apple I buy again an apple
8 times after I bought an apple I buy a banana
9 times after I bought an apple I buy cherries
7 times after I bought an apple I buy a pear

Thank you in advance for your time!

andresrcs · June 23, 2019, 2:31pm

This sound like Association Rules Mining, take a look to this package

Yarnabrina · June 23, 2019, 2:41pm

May be something like this?

\begin{align} & \ \mathbb{P}(cherry \rightarrow apple \rightarrow banana \rightarrow pear \mid initially \ banana) \\ = & \ \mathbb{P}(pear \mid cherry \rightarrow apple \rightarrow banana, initially \ banana) \times \\ & \ \mathbb{P}(banana \mid cherry \rightarrow apple, initially \ banana) \times \\ & \ \mathbb {P} (apple \mid cherry, initially \ banana) \times \\ & \ \mathbb {P} (cherry \mid initially \ banana) \\ = & \ \mathbb {P} (pear \mid banana) \times \\ & \ \mathbb {P} (banana \mid apple) \times \\ & \ \mathbb {P} (apple \mid cherry) \times \\ & \ \mathbb {P} (cherry \mid initially \ banana) \\ = & \ \frac {8} {6 + 8 + 3 + 8} \times \\ & \ \frac {8} {7 + 8 + 9 + 7} \times \\ & \ \frac {9} {9 + 3 + 3 + 4} \times \\ & \ \frac {3} {6 + 8 + 3 + 8} \\ = & \frac {1728} {368125} \end{align}

Yarnabrina · June 23, 2019, 3:37pm

In case it helps, here's an implementation. I used a different seed, because probably we have different R versions and hence I get different results from yours using 1234. If you find an elegant solution using the package Andres suggested, can I request you to share that?

set.seed(seed = 33734)
dataset <- data.frame(fruits = sample(x = c("apple", "banana", "cherry", "pear"),
                                      size = 100,
                                      replace = TRUE))
(occurrence_matrix <- with(data = dataset,
                           expr = table(fruits[-length(x = fruits)], fruits[-1])))
#>          apple banana cherry pear
#>   apple      8      6      5    8
#>   banana     8      5      4    7
#>   cherry     4      4      1    9
#>   pear       7      9      8    6
(transition_probability_matrix <- (occurrence_matrix / rowSums(x = occurrence_matrix)))
#>               apple     banana     cherry       pear
#>   apple  0.29629630 0.22222222 0.18518519 0.29629630
#>   banana 0.33333333 0.20833333 0.16666667 0.29166667
#>   cherry 0.22222222 0.22222222 0.05555556 0.50000000
#>   pear   0.23333333 0.30000000 0.26666667 0.20000000
desired_combination <- c("banana", "cherry", "apple", "banana", "pear")
desired_positions <- data.frame(from = desired_combination[-length(x = desired_combination)],
                                to = desired_combination[-1])
required_probabilities <- apply(X = desired_positions,
                                MARGIN = 1,
                                FUN = function(t) transition_probability_matrix[t[1], t[2]])
(final_answer <- prod(required_probabilities))
#> [1] 0.002400549

^{Created on 2019-06-23 by the reprex package (v0.3.0)}

enieyi · June 23, 2019, 4:25pm

Yarnabrina, thank you for your solution and implementation! Yes, of course, I will post. I need to read more first and test to see how to use the info from Andresrcs' link and see what works for me.

enieyi · June 27, 2019, 9:45pm

I want to give Andresrcs 2,3 more hearts for the links, these packages are amazing! Yarnabrina, thank you as well for implementing your solution in r. I found something that worked for me, not that much for the example I gave, I had to modify it in order to implement a solution with the arules package. I added one variable, I called it customer, so I can treat the fruits like a market basket analysis, so, I have 10 customers and everyone bought different fruits.

fruits <- c("apples", "pears","bananas", "cherries")
customer <- rep(c(1:10), each = 3)
set.seed(1233)
df_fruits <- data.frame(customer = sample(customer, 100, replace = T),
                          fruits = sample(fruits,100, replace = T, prob=c(0.29,0.60,0.5,0.1)))
# order the numeric variable
df_fruits <- df_fruits[order(df_fruits$customer),] 

library(arules)
# create transactioanl data
trans <- as(split(df_fruits[,"fruits"], df_fruits[,"customer"]), "transactions")
inspect(trans)

# apply apriori algorithm
rule <- apriori(trans, parameter = list(supp = 0.01, conf = 0.8,minlen=2))

summary(rule)
inspect(head(sort(rule, by="lift"), 5))

Based on the output after a customer had bought bananas and cherries he/she will buy apples with a probability of 80% (that would be the confidence) and this rule(pattern) happened for 40% (support) of the customers. Lift equal to 1 would mean that the pattern happened by chance. The greater the lift the more interesting is the rule.

I will close the issue with Yarnabrina's solution. Given my scarce example(one variable) it was the absolute best.
Thank you, both!

system · July 4, 2019, 9:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.