i have a dataset for intrusion detection system stored in csv file, when i run my script there an error.

amjadh · August 15, 2023, 12:19pm

library(kernlab)

library(caret)
anomaly<-read.csv("D:\datasets\data\Dataset_Anomaly.csv", na.strings=c(".", "NA", "", "?"), strip.white=TRUE, encoding="UTF-8")
aRow<-nrow(anomaly)
aCol<-ncol(anomaly)

sub<-sample(1:aRow,floor(0.66*aRow))
anomalyTrainingSet<- anomaly[sub,]
anomalyTestSet<- anomaly[-sub,]
anomalyClassifier<- ksvm(AttackType~.,data=anomalyTrainingSet,type = 'C-svc', kernel = 'rbfdot')
Error in if ((type(ret) == "C-svc" || type(ret) == "nu-svc" || type(ret) == :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In .local(x, ...) : Variable(s) `' constant. Cannot scale data.
2: In .local(x, ...) : NAs introduced by coercion
anomalyPrediction<-predict(anomalyClassifier, anomalyTestSet[,-aCol])
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'predict': object 'anomalyClassifier' not found
confusionMatrix(anomalyPrediction,anomalyTestSet[,aCol] )
Error: object 'anomalyPrediction' not found

GeraldineK · August 15, 2023, 12:27pm

This is the part where your (first) error occurs. It appears that is code from the function ksvm(). You have not posted enough detail for anyone to do some true troubleshooting (FYI you should try to post a reprex, a "reproducible example", e.g. with some short example data, so others can run your code as well to reproduce the error).
But what this error message refers to, is that R is trying to run the conditions (e.g. type(ret) == "C-svc" etc) but somewhere there are missing values (NAs) instead of actual values (numbers or words). R can't give an output for that, e.g. if type(ret) is NA, it cannot check if it is the same as "C-svc". You need to check your data, or your arguments in ksvm().

amjadh · August 15, 2023, 12:32pm

dear,
thank you for your notes
can i send dataset to you by mail?

Note:
this script was working correctly with the same dataset before i change my pc and R version

FJCC · August 15, 2023, 12:37pm

I also note that the error message contains

1: In .local(x, ...) : Variable(s) `' constant. Cannot scale data.
2: In .local(x, ...) : NAs introduced by coercion

If your data frame contains a column with constant values, try removing that column before processing the data.

amjadh · August 15, 2023, 12:44pm

Dear,
is there function in r to remove constant column while reading csv file?

amjadh · August 15, 2023, 12:51pm

srv_count	serror_rate	srv_error_rate	rerror_rate	srv_rerror_rate	same_srv_rate	diff_srv_rate	srv_diff_host_rate	dst_host_count	dst_host_srv_count	dst_host_same_srv_rate	dst_host_diff_srv_rate	dst_host_same_src_port_rate	dst_host_srv_diff_host_rate	dst_host_serror_rate	dst_host_srv_serror_rate	dst_host_rerror_rate	dst_host_srv_rerror_rate	AttackType
0.001	0.05	0.1	0	0	0.05	0.1	0	0.218	0.001	0	0.095	0.096	0	0.096	0.1	0	0	Attack
0.001	0.071	0.1	0.029	0	0.014	0.057	0	0.255	0.001	0	0.002	0.003	0	0.002	0.1	0.001	0	Attack
0.003	0	0	0	0	0.1	0	0	0.031	0.255	0.1	0	0.003	0.004	0	0	0	0	Normal
0.013	0	0	0	0	0.1	0	0	0.235	0.255	0.1	0	0	0.001	0	0	0	0	Normal
0.008	0	0	0	0	0.1	0	0	0.021	0.156	0.1	0	0.005	0.004	0	0	0	0	Normal
0.023	0	0	0	0	0.1	0	0.009	0.166	0.255	0.1	0	0.001	0.002	0	0	0	0	Normal
0.003	0	0	0	0	0.1	0	0	0.072	0.072	0.1	0	0.001	0	0	0	0	0	Normal
0.001	0	0	0	0	0.1	0	0	0.098	0.018	0.01	0.005	0.001	0.011	0	0	0	0	Normal
0.001	0	0	0	0	0.1	0	0	0.039	0.255	0.1	0	0.003	0.004	0	0	0	0	Normal
0.013	0	0	0	0	0.1	0	0	0.025	0.255	0.1	0	0.004	0.004	0	0.001	0	0	Normal
0.011	0	0	0	0	0.1	0	0	0.255	0.255	0.1	0	0	0	0	0	0	0	Normal
0.008	0	0	0	0	0.1	0	0	0.049	0.255	0.1	0	0.004	0.002	0	0	0	0	Normal
0.001	0.1	0.1	0	0	0.1	0	0	0.214	0.001	0	0.095	0.096	0	0.096	0.1	0	0	Attack
0.001	0	0	0	0	0.1	0	0	0.033	0.017	0.009	0.012	0.003	0.012	0	0	0	0	Normal
0.005	0	0	0	0	0.1	0	0	0.043	0.255	0.1	0	0.002	0.004	0	0	0	0	Normal
0.315	0	0	0	0	0.1	0	0	0.147	0.002	0.001	0.002	0.001	0	0	0	0	0	Attack
0.001	0.004	0	0.089	0.1	0.001	0.099	0	0.255	0.001	0	0.067	0	0	0.002	0	0.06	0.1	Attack
0.001	0	0	0	0	0.033	0.067	0	0.035	0.012	0.011	0.011	0.003	0.017	0	0	0	0	Normal
0.001	0	0	0	0	0.1	0	0	0.073	0.045	0.062	0.007	0.001	0	0	0	0	0	Normal
0.014	0	0	0	0	0.1	0	0	0.049	0.255	0.1	0	0.002	0.004	0	0	0	0	Normal
0.014	0	0	0	0	0.1	0	0	0.041	0.132	0.1	0	0.002	0.004	0	0	0	0	Normal
0.005	0	0	0	0	0.1	0	0	0.044	0.255	0.1	0	0.002	0.005	0	0	0	0	Normal
0.017	0	0	0	0	0.1	0	0	0.255	0.255	0.1	0	0	0	0	0	0	0	Normal

FJCC · August 15, 2023, 1:12pm

It looks like none of your columns are constant but some have very few non-zero values. I suspect that when you sample the data to make a training set, you get a subset that does have at least one constant column. You can manually look for constant columns with the summary() function. It will show that the Min and Max are the same in the constant column. You can then remove the column using its column number. Here is an example where I remove the third column.

DF <- data.frame(A = rnorm(5), B = rnorm(5), C = 0, D = rnorm(5))
DF
#>            A          B C          D
#> 1 -1.4695138 -0.2189530 0 -0.9091177
#> 2 -1.0583674 -0.6116602 0 -0.2796064
#> 3 -0.2176584  0.7033546 0 -0.1457569
#> 4  0.7765660  1.5513289 0 -0.2933012
#> 5 -0.7435114  1.3171561 0  0.1938422
summary(DF)
#>        A                 B                 C           D          
#>  Min.   :-1.4695   Min.   :-0.6117   Min.   :0   Min.   :-0.9091  
#>  1st Qu.:-1.0584   1st Qu.:-0.2190   1st Qu.:0   1st Qu.:-0.2933  
#>  Median :-0.7435   Median : 0.7034   Median :0   Median :-0.2796  
#>  Mean   :-0.5425   Mean   : 0.5482   Mean   :0   Mean   :-0.2868  
#>  3rd Qu.:-0.2177   3rd Qu.: 1.3172   3rd Qu.:0   3rd Qu.:-0.1458  
#>  Max.   : 0.7766   Max.   : 1.5513   Max.   :0   Max.   : 0.1938
DF_new <- DF[, -3]
summary(DF_new)
#>        A                 B                 D          
#>  Min.   :-1.4695   Min.   :-0.6117   Min.   :-0.9091  
#>  1st Qu.:-1.0584   1st Qu.:-0.2190   1st Qu.:-0.2933  
#>  Median :-0.7435   Median : 0.7034   Median :-0.2796  
#>  Mean   :-0.5425   Mean   : 0.5482   Mean   :-0.2868  
#>  3rd Qu.:-0.2177   3rd Qu.: 1.3172   3rd Qu.:-0.1458  
#>  Max.   : 0.7766   Max.   : 1.5513   Max.   : 0.1938

^{Created on 2023-08-15 with reprex v2.0.2}
You would need to do something similar with your data frame anomalyTrainingSet. There are many other ways to remove a column but that should be sufficient for this case.

amjadh · August 15, 2023, 1:22pm

Dear,
in this link there are code with dataset

could you help me to run it?

FJCC · August 15, 2023, 3:42pm

I have been busy and will be busy for the next few hours. Please try what I suggested and report any specific problems so others can help you.

GeraldineK · August 18, 2023, 7:48pm

I had a look at the code and the dataset. The (first) problem is that AttackType is not categorical, it is not a factor. Add anomaly$AttackType <- factor(anomaly$AttackType) before doing sampling and making training and test sets. FYI: always read the documentation for a function; it's fairly clear that for type = "C-svc" your response variable (y) needs to be factors.

While this makes the code run, note that you still get the warning:

Warning message: In .local(x, ...) : Variable(s) `' constant. Cannot scale data.

Using the summary() approach by @FJCC will show you which columns have the same values throughout (meaning they are useless in terms of predicting or explaining anything). Always keep in mind that a warning is just that, a warning, and your code will still run. It is your responsibility to check what the warning is about and whether or not it's something you can ignore, or if you need to fix something.

system · September 8, 2023, 7:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.