Hi,
I'm dealing with tennis betting. I got a prediction model of win or loss for a certain player with 90% Accuracy which you can see here:
Load packages:
library(caret)
library(h2o)
library(dplyr)
Part 1: Data
The _atp data.csv dataset can be obtained by running the following code: ATPBetting/main.py at master · edouardthom/ATPBetting · GitHub using the following .xls files:
Load data:
df <- read.csv("atp_data.csv")
colnames(df)
"ATP" "Location" "Tournament" "Date" "Series" "Court" "Surface" "Round" "Best.of" "Winner"
"Loser" "WRank" "LRank" "Wsets" "Lsets" "Comment" "PSW" "PSL" "B365W" "B365L"
"elo_winner" "elo_loser" "proba_elo"
Part 2: Modeling
df1<-df %>%
filter(Winner=="Hajek J."|Loser=="Hajek J.")
vet <- rep(NA,nrow(df1))
vet[which(df1$Winner=="Hajek J.")]<-1
vet[-which(df1$Winner=="Hajek J.")]<-2
df1$target<-as.factor(vet)
h2o.init()
n <-nrow(df1)
training <- df1[1:round(n*0.7),]
testing <- df1[round(n*0.7):n,]
train <-as.h2o(training)
y <- "target"
x <- setdiff(names(train), y)
aml <- h2o.automl(x = x, y = y,
training_frame = train,
max_runtime_secs = 120)
model <- aml@leader
model
p2 = h2o.predict(model, newdata=as.h2o(testing))
df3<- as.data.frame(p2)
confusionMatrix(df3$predict,testing$target)
but I don't understand where to find the data to make predictions. Here is data for the next games: http://livescore.tennis-data.co.uk/ but the variables of the dataset atp_data.csv to make predictions for a certain player where can I find them? Thanks