I am currently conducting a study on the predictive qualities of odds (Regarding Football/Soccer). I have odds from multiple bookies on each of the seasons and leagues within the study ( as below ). Percentage of correctly predicted matches is rather low. Being in between 40-50%,sometimes even 30% and rarely going over 50%. Is there anything wrong with the code or within the data I am providing to the Decision tree that is causing such a low percentage ?
I have already tried k-fold cross validation and adding extra data such as elo ratings to no avail. I am excluding null values. Teams have been given both as factors and as dummy variables.
Structure of Data
|-----|--------------|---------------|-------|-------|-------|
| FTR | Home Team | Away Team | BetH | BetD | BetA |
|-----|--------------|---------------|-------|-------|-------|
| H | Chelsea | Liverpool | 1.35 | 3.35 | 2.65 |
R Code
DT1 <- x
set.seed(123)
DT1$FTR <- as.factor(DT1$FTR)
DT1.rows <- nrow(DT1)
DT1.sample <- sample(DT1.rows, DT1.rows * 0.6)
DT1.train <- DT1[DT1.sample, ]
DT1.test <- DT1[-DT1.sample, ]
DT1.model <- C5.0(DT1.train[, -1], DT1.train$FTR, trails = 100)
plot(DT1.model)
summary(DT1.model)
DT1.predict <- predict (DT1.model, DT1.test[, -1])
CrossTable(
DT1.test$FTR,
DT1.predict,
prop.c = FALSE,
prop.r = FALSE,
prop.chisq = FALSE
)