Hi I was hoping If anyone could help me with the following.
I have data "hotel reviews", There are 7 different hotels in total with coloms such as good review, bad review, avg score, date etc. First I had to calculate the MCC score for each hotel. I did it with the following:
statsConfusionMatrix <- function(sentlabels, preds) {
mytab<- table(sentlabels, preds)
TP = as.numeric(mytab[2,2])
TN = as.numeric(mytab[1,1])
FN = as.numeric(mytab[2,1])
FP = as.numeric(mytab[1,2])
MCC = (TP * TN - FP * FN) / (sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)))
return(list(MCC))
}
myresults=list()
Now I have to research for each hotel with a high enough score (MCC> 0.2), whether there are negative comments made all over the beds.
I don't know how to proceed any further.
Welcome. I am not totally sure what your question is. If you have a question about how to acheive your goal with code, it's helpful to pose your question as a reproducible example (reprex). This makes it much easier to understand your issue, and reprexs are great starting points to offer you a suggestion.
You might then group by specific hotel, and calculate summary statistics of those hotels (e.g. negative and positive comments). That R4DS chapter covers basics of those operations as well.
Given this is the 2nd of a very similar question, I should make you aware of our homework policy, FAQ: Homework Policy. We are happy to help with homework, but be sure to mark them as such.