What does nnet minimise

Yarnabrina · December 28, 2018, 8:41am

Thanks for the reply, but I don't think I've made my question clear. My problem is not the implementation. I want to know what is the metric being optimised by the nnet function. I don't think a reproducible example is relevant for that question. I can't find anything other than the following in the documentation, which does not really answer my question:

Optimization is done via the BFGS method of optim.

But still, if it helps, here's what I have done:

# loading package
library(package = "nnet")

# loading dataset
red_wine <- read.csv2(file = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv",
                      header = TRUE)

# modifying dataset to avoid very low class proportions
red_wine$quality <- sapply(X = red_wine[, 12],
                           FUN = function(x)
                           {
                             if((x == 3) | (x == 4))
                             {
                               x <- "low"
                             } else if(x == 5)
                             {
                               x <- "lower_middle"
                             } else if(x == 6)
                             {
                               x <- "higher_middle"
                             } else
                             {
                               x <- "high"
                             }
                           })
red_wine$quality <- factor(x = red_wine[, 12],
                           levels = c("low",
                                      "lower_middle",
                                      "higher_middle",
                                      "high"),
                           ordered = TRUE)

# splitting train and test subsets
red_indices <- sample(x = c(TRUE, FALSE),
                      size = nrow(red_wine),
                      replace = TRUE,
                      prob = c(0.8, 0.2))
red_train <- red_wine[red_indices,]
red_test <- red_wine[!red_indices,]

# implementing single hidden layer neural network
no_hidden_nodes <- 30
max_iterations <- 500
red_nn_model <- nnet::nnet(x = red_train[, -12],
                           y = class.ind(red_train[, 12]),
                           size = no_hidden_nodes,
                           softmax = TRUE,
                           maxit = max_iterations,
                           trace = TRUE)
#> # weights:  484
#> initial  value 1830.332882 
#> iter  10 value 1320.948431
#> iter  20 value 1282.400645
#> iter  30 value 1215.595921
#> iter  40 value 1146.536261
#> iter  50 value 1093.389122
#> iter  60 value 1048.528644
#> iter  70 value 1017.228076
#> iter  80 value 992.588107
#> iter  90 value 982.810268
#> iter 100 value 978.270736
#> iter 110 value 971.337690
#> iter 120 value 954.402500
#> iter 130 value 928.415571
#> iter 140 value 900.070623
#> iter 150 value 879.767641
#> iter 160 value 858.583582
#> iter 170 value 840.634227
#> iter 180 value 828.451394
#> iter 190 value 827.021680
#> iter 200 value 824.994217
#> iter 210 value 823.199409
#> iter 220 value 819.632886
#> iter 230 value 815.776615
#> iter 240 value 810.148442
#> iter 250 value 804.609398
#> iter 260 value 799.187227
#> iter 270 value 794.894583
#> iter 280 value 791.952878
#> iter 290 value 791.093384
#> iter 300 value 790.699234
#> iter 310 value 790.200431
#> iter 320 value 787.894134
#> iter 330 value 784.905971
#> iter 340 value 783.498939
#> iter 350 value 781.796986
#> iter 360 value 780.267908
#> iter 370 value 778.546393
#> iter 380 value 775.098411
#> iter 390 value 772.903257
#> iter 400 value 770.701749
#> iter 410 value 769.321650
#> iter 420 value 768.203662
#> iter 430 value 767.204172
#> iter 440 value 766.122717
#> iter 450 value 765.488524
#> iter 460 value 764.656615
#> iter 470 value 764.062411
#> iter 480 value 763.643528
#> iter 490 value 763.381490
#> iter 500 value 763.266544
#> final  value 763.266544 
#> stopped after 500 iterations

# checking performance
predictions <- factor(x = predict(object = red_nn_model,
                                  newdata = red_test[, -12],
                                  type = "class"),
                      levels = c("low",
                                 "lower_middle",
                                 "higher_middle",
                                 "high"),
                      ordered = TRUE)
(confusion_matrix <- table(Predicted = predictions,
                           Actual = red_test[, 12]))
#>                Actual
#> Predicted       low lower_middle higher_middle high
#>   low             3            2             2    0
#>   lower_middle    8          102            50    2
#>   higher_middle   5           45            84   17
#>   high            0            2            18   21

^{Created on 2018-12-28 by the reprex package (v0.2.1)}

As you can see, there's a lot of misclassification. I know that I'll have to do trial and error with number of hidden nodes. But, still, I don't think misclassifications in between the 2nd or 3rd class are expected, as there's lot of data on those two classes.

@ Alex

Thanks for the response.

I am not really comparing or combining the two models for red and white wines. I am considering the two data sets separately.

I understand that neural net is not completely deterministic, but as you will note from my example as given above, a lot of observations from 2nd class are misclassified in 3rd class, and vice versa. Both of these classes have approximately 40% of the data set, so this is very surprising to me. I would have expected much more misclassifications for the 1st and 4th classes.

That being said, I repeated this around 20 times, and I got perfect classification twice. Otherwise, the misclassification patterns were more or less same.