Thanks for the reply, but I don't think I've made my question clear. My problem is not the implementation. I want to know what is the metric being optimised by the nnet
function. I don't think a reproducible example is relevant for that question. I can't find anything other than the following in the documentation, which does not really answer my question:
Optimization is done via the BFGS method of optim.
But still, if it helps, here's what I have done:
# loading package
library(package = "nnet")
# loading dataset
red_wine <- read.csv2(file = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv",
header = TRUE)
# modifying dataset to avoid very low class proportions
red_wine$quality <- sapply(X = red_wine[, 12],
FUN = function(x)
{
if((x == 3) | (x == 4))
{
x <- "low"
} else if(x == 5)
{
x <- "lower_middle"
} else if(x == 6)
{
x <- "higher_middle"
} else
{
x <- "high"
}
})
red_wine$quality <- factor(x = red_wine[, 12],
levels = c("low",
"lower_middle",
"higher_middle",
"high"),
ordered = TRUE)
# splitting train and test subsets
red_indices <- sample(x = c(TRUE, FALSE),
size = nrow(red_wine),
replace = TRUE,
prob = c(0.8, 0.2))
red_train <- red_wine[red_indices,]
red_test <- red_wine[!red_indices,]
# implementing single hidden layer neural network
no_hidden_nodes <- 30
max_iterations <- 500
red_nn_model <- nnet::nnet(x = red_train[, -12],
y = class.ind(red_train[, 12]),
size = no_hidden_nodes,
softmax = TRUE,
maxit = max_iterations,
trace = TRUE)
#> # weights: 484
#> initial value 1830.332882
#> iter 10 value 1320.948431
#> iter 20 value 1282.400645
#> iter 30 value 1215.595921
#> iter 40 value 1146.536261
#> iter 50 value 1093.389122
#> iter 60 value 1048.528644
#> iter 70 value 1017.228076
#> iter 80 value 992.588107
#> iter 90 value 982.810268
#> iter 100 value 978.270736
#> iter 110 value 971.337690
#> iter 120 value 954.402500
#> iter 130 value 928.415571
#> iter 140 value 900.070623
#> iter 150 value 879.767641
#> iter 160 value 858.583582
#> iter 170 value 840.634227
#> iter 180 value 828.451394
#> iter 190 value 827.021680
#> iter 200 value 824.994217
#> iter 210 value 823.199409
#> iter 220 value 819.632886
#> iter 230 value 815.776615
#> iter 240 value 810.148442
#> iter 250 value 804.609398
#> iter 260 value 799.187227
#> iter 270 value 794.894583
#> iter 280 value 791.952878
#> iter 290 value 791.093384
#> iter 300 value 790.699234
#> iter 310 value 790.200431
#> iter 320 value 787.894134
#> iter 330 value 784.905971
#> iter 340 value 783.498939
#> iter 350 value 781.796986
#> iter 360 value 780.267908
#> iter 370 value 778.546393
#> iter 380 value 775.098411
#> iter 390 value 772.903257
#> iter 400 value 770.701749
#> iter 410 value 769.321650
#> iter 420 value 768.203662
#> iter 430 value 767.204172
#> iter 440 value 766.122717
#> iter 450 value 765.488524
#> iter 460 value 764.656615
#> iter 470 value 764.062411
#> iter 480 value 763.643528
#> iter 490 value 763.381490
#> iter 500 value 763.266544
#> final value 763.266544
#> stopped after 500 iterations
# checking performance
predictions <- factor(x = predict(object = red_nn_model,
newdata = red_test[, -12],
type = "class"),
levels = c("low",
"lower_middle",
"higher_middle",
"high"),
ordered = TRUE)
(confusion_matrix <- table(Predicted = predictions,
Actual = red_test[, 12]))
#> Actual
#> Predicted low lower_middle higher_middle high
#> low 3 2 2 0
#> lower_middle 8 102 50 2
#> higher_middle 5 45 84 17
#> high 0 2 18 21
Created on 2018-12-28 by the reprex package (v0.2.1)
As you can see, there's a lot of misclassification. I know that I'll have to do trial and error with number of hidden nodes. But, still, I don't think misclassifications in between the 2nd or 3rd class are expected, as there's lot of data on those two classes.
@ Alex
Thanks for the response.
I am not really comparing or combining the two models for red and white wines. I am considering the two data sets separately.
I understand that neural net is not completely deterministic, but as you will note from my example as given above, a lot of observations from 2nd class are misclassified in 3rd class, and vice versa. Both of these classes have approximately 40% of the data set, so this is very surprising to me. I would have expected much more misclassifications for the 1st and 4th classes.
That being said, I repeated this around 20 times, and I got perfect classification twice. Otherwise, the misclassification patterns were more or less same.