# Multinominal logistic regression - bad accuracy

Hi, and welcome!

There's a convention here about questions that are part of degree requirements, which is shorthanded as homework, and another about reproducible example, called a reprex, both of which will help you get improved responses and better answers.

So, 1) understand that we will help explain and point you in the right direction, 2) we may provide code for specific portions of code, but won't give you a complete solution, and 3) it really helps to have a `reprex` that includes the data or a representative extract, rather than

``````tabelle<- read.csv("tabelleohneausreißer.csv",sep=";")
``````

which is nowhere to be found.

So, let's start from the basics.

Your problem involves a multinominal dependent variable, and you are applying a script at the link that states

1. Your dependent variable must be Nominal . This does not mean that multinomial regression cannot be used for the ordinal variable. However, for multinomial regression, we need to run ordinal logistic regression.

The first step in getting back on track is reproducing the example you are working from

``````library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(rattle.data)
data(wine)
train <- sample_frac(wine, 0.7)
sample_id <- as.numeric(rownames(train)) # rownames() returns character so as.numeric
test <- wine[-sample_id,]
head(test) # shortened from example
#>     Type Alcohol Malic  Ash Alcalinity Magnesium Phenols Flavanoids
#> 126    2   12.07  2.16 2.17       21.0        85    2.60       2.65
#> 127    2   12.43  1.53 2.29       21.5        86    2.74       3.15
#> 128    2   11.79  2.13 2.78       28.5        92    2.13       2.24
#> 129    2   12.37  1.63 2.30       24.5        88    2.22       2.45
#> 130    2   12.04  4.30 2.38       22.0        80    2.10       1.75
#> 131    3   12.86  1.35 2.32       18.0       122    1.51       1.25
#>     Nonflavanoids Proanthocyanins Color  Hue Dilution Proline
#> 126          0.37            1.35  2.76 0.86     3.28     378
#> 127          0.39            1.77  3.94 0.69     2.84     352
#> 128          0.58            1.76  3.00 0.97     2.44     466
#> 129          0.40            1.90  2.12 0.89     2.78     342
#> 130          0.42            1.35  2.60 0.79     2.57     580
#> 131          0.21            0.94  4.10 0.76     1.29     630
train <- sample_frac(wine, 0.7)
sample_id <- as.numeric(rownames(train)) # rownames() returns character so as.numeric
test <- wine[-sample_id,]
train\$Type <- relevel(train\$Type, ref = "3")
require(nnet)
#> Loading required package: nnet
multinom.fit <- multinom(Type ~ Alcohol + Color -1, data = train)
#> # weights:  9 (4 variable)
#> initial  value 137.326536
#> iter  10 value 78.435565
#> final  value 78.365718
#> converged
exp(coef(multinom.fit))
#>    Alcohol      Color
#> 1 1.482424 0.44367306
#> 2 2.541586 0.08372892
head(probability.table <- fitted(multinom.fit))
#>             3         1            2
#> 1 0.006129304 0.0905185 0.9033521930
#> 2 0.047171515 0.3049780 0.6478504747
#> 3 0.836364425 0.1634788 0.0001567621
#> 4 0.332745275 0.6129917 0.0542630372
#> 5 0.598866043 0.3970107 0.0041232804
#> 6 0.025020422 0.2518637 0.7231158655
train\$precticed <- predict(multinom.fit, newdata = train, "class")
ctable <- table(train\$Type, train\$precticed)
round((sum(diag(ctable))/sum(ctable))*100,2)
#> [1] 71.2
test\$precticed <- predict(multinom.fit, newdata = test, "class")
ctable <- table(test\$Type, test\$precticed)
round((sum(diag(ctable))/sum(ctable))*100,2)
#> [1] 9.43
``````

Created on 2019-11-05 by the reprex package (v0.3.0)

Now, I didn't spend a lot of time on this, but I did notice that the statements

Accuracy in training dataset is 68.8%

and

The accuracy of the test dataset turns out to be 18.4% less as compared to training dataset

are inconsistent with the results of the code.

That raises a meta-question: Is this the right example?