I am trying to create new data to receive a balanced train set for classification with decision tree. When using the SMOTE function, I am always getting the same error:
Error in names(dn) <- dnn : attempt to set an attribute on NULL In
addition: Warning message: In names(data) == as.character(form[[2]]) :
longer object length is not a multiple of shorter object length
I converted everything to factor with as.factor()
and deleted the NA's:
train <- na.omit(train)
> str(train)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 11526 obs. of 5 variables:
$ number: Factor w/ 2 levels "problem",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Land: Factor w/ 29 levels "Australien","Belgien",..: 3 3 3 3 3 3 3 3 9 3 ...
$ direction: Factor w/ 2 levels "LL","RL": 1 1 1 1 1 1 1 1 2 1 ...
$ transmission: Factor w/ 2 levels "AUT","SCH": 1 1 1 1 1 1 1 1 1 1 ...
$ range: Factor w/ 4 levels "1","2","3","4": 3 3 3 2 1 3 2 4 3 2 ...
- attr(*, "na.action")= 'omit' Named int 6500 9748
..- attr(*, "names")= chr "6500" "9748"
The head of my train set looks like this:
> head(train,10)
number Land direction transmission range
1 reference Bundesrep. Deutschland LL AUT 3
2 reference Bundesrep. Deutschland LL AUT 3
3 reference Bundesrep. Deutschland LL AUT 3
4 reference Bundesrep. Deutschland LL AUT 2
5 reference Bundesrep. Deutschland LL AUT 1
6 reference Bundesrep. Deutschland LL AUT 3
7 reference Bundesrep. Deutschland LL AUT 2
8 problem Taiwan LL AUT 3
9 reference Bundesrep. Deutschland LL AUT 4
10 reference Grossbritannien RL AUT 3
11 reference Bundesrep. Deutschland LL SCH 2
And this is my code:
smote_train <- SMOTE(train$number ~ ., data = train, perc.over=500, k =5, learner=NULL)