Of course, the documentation contains the fitting criterion, but as it could be expected of Brian Ripley & William Venables, it's a bit terse. Also, it has to be noted that the vast majority of the packages developed by these two geniuses, to which we all owe a huge lot, was expected to be used with this book by your side:
http://www.stats.ox.ac.uk/pub/MASS4/
So, that also contributes to the tersiness of the documentation (users of their packages are expected to be familiar with the book contents, and with their documentation style). Here's the fitting criterion:
linout
switch for linear output units. Default logistic output units.
entropy
switch for entropy (= maximum conditional likelihood) fitting. Default by least-squares.
softmax
switch for softmax (log-linear model) and maximum conditional likelihood fitting. linout, entropy, softmax and censored are mutually exclusive.
censored
A variant on softmax, in which non-zero targets mean possible classes. Thus for softmax a row of (0, 1, 1) means one example each of classes 2 and 3, but for censored it means one example whose class is only known to be 2 or 3.
What these lines are saying, is that the parameters linout
, entropy
, softmax
and censored
are mutually exclusive, and they define the loss function. In particular, according to this documentation I would think that for classification the best setting would be softmax = TRUE
:
softmax
switch for softmax (log-linear model) and maximum conditional likelihood fitting. linout, entropy, softmax and censored are mutually exclusive.
However, what puzzles me is that the example in the documentation, which precisely illustrates how to use nnet
in order to perform classification on the iris
dataset (thus, a problem fairly similar to yours) doesn't set any of these four parameters to TRUE
. According to my interpretation of the documentation, this means that, in this example, Venables & Ripley are effectively using least squares as a fitting criterion:
entropy
switch for entropy (= maximum conditional likelihood) fitting. Default by least-squares.
As we can read, the default (entropy=FALSE
) should correspond to least-squares fitting.
However, rather than trying to perform the exegesis (a term that Bill Venables used to love ) of the text, why don't you use keras
or h2o
? I'm not sure I understand why you have to use nnet
.
PS if you're absolutely determined to use nnet
, and you want to be 100% sure which loss function is being used, you may try to ask a question here
https://stat.ethz.ch/mailman/listinfo/r-help
Be warned that the bar for posting on the R-help mailing list is set relatively high, so I strongly recommend against posting there before you've become very familiar with this document
https://www.r-project.org/posting-guide.html
Among the other things, you'll need to be familiar with the documentation of nnet
before posting, include a reproducible example in your post (like the one you posted here) and try to be as much specific and clear as possible about your question.
Finally, another tool which can help you using nnet
is
maybe you could even try to contact Greer Humphrey by mail and ask her directly which loss function is being used (I believe she's very familiar with nnet
), though she may of course redirect you to either the R-help mailing list, or to Stack Overflow. On Stack Overflow the current policy is to close posts which ask details about specific packages, so you might not be able to ask there.
I say, save yourself the hassle and use one of the other two packages, but of course the choice is yours. Best of luck!