I want to give Andresrcs 2,3 more hearts for the links, these packages are amazing! Yarnabrina, thank you as well for implementing your solution in r. I found something that worked for me, not that much for the example I gave, I had to modify it in order to implement a solution with the arules
package. I added one variable, I called it customer
, so I can treat the fruits like a market basket analysis, so, I have 10 customers and everyone bought different fruits.
fruits <- c("apples", "pears","bananas", "cherries")
customer <- rep(c(1:10), each = 3)
set.seed(1233)
df_fruits <- data.frame(customer = sample(customer, 100, replace = T),
fruits = sample(fruits,100, replace = T, prob=c(0.29,0.60,0.5,0.1)))
# order the numeric variable
df_fruits <- df_fruits[order(df_fruits$customer),]
library(arules)
# create transactioanl data
trans <- as(split(df_fruits[,"fruits"], df_fruits[,"customer"]), "transactions")
inspect(trans)
# apply apriori algorithm
rule <- apriori(trans, parameter = list(supp = 0.01, conf = 0.8,minlen=2))
summary(rule)
inspect(head(sort(rule, by="lift"), 5))
Based on the output after a customer had bought bananas and cherries he/she will buy apples with a probability of 80% (that would be the confidence) and this rule(pattern) happened for 40% (support) of the customers. Lift equal to 1 would mean that the pattern happened by chance. The greater the lift the more interesting is the rule.
I will close the issue with Yarnabrina's solution. Given my scarce example(one variable) it was the absolute best.
Thank you, both!