I am trying to run the decision tree C5.0 model with the following dataset:
DT5_Example
id A B C D E PF
1 1 0.0045 0.765 0.0072 0.938 0.809 1
2 2 0.0022 1 0.0076 0.938 1 1
3 3 0.0030 1 0.0010 0.946 1 1
4 4 0.0054 1 0.0045 0.844 1 0
5 5 0.0046 1 0.0041 0.856 1 1
6 6 0.0048 1 0.0051 0.846 1 0
7 7 0.0038 1 0.0005 0.617 0.987 1
8 8 0.0275 1 0.0103 0.954 1 1
9 9 0.0017 1 0.0129 0.917 1 1
10 10 0.0139 1 0.0059 0.983 1 1
Below is my script:
A<-DT5_Example$A
B<-DT5_Example$B
C<-DT5_Example$C
D<-DT5_Example$D
E<-DT5_Example$E
vars<-c(A, B, C, D, E)
Converting PF into a factor because it is the outcome variable
DT5_Example2<-DT5_Example %>%
mutate(PFcat=factor(PF, levels = c(0,1))) %>% collect()
Fitting the C5.0 model to the data
install.packages("C50")
library(C50)
DT5_model<-C5.0(x=DT5_Example2[, vars], y = DT5_Example2$PFcat)
summary(DT5_model)
I received the following error message:
Error: Must subset columns with a valid subscript vector.
x Can't convert from to due to loss of precision.
If you run the model with PF as an integer variable, you still receive the same message
I already googled this error and read topics related in the RStudio community, and I have not been able to fix it. Any help will be appreciated. Thanks.