I have a categorical variable with N factor levels (e.g. gender has two levels) in binary classification problem. I have converted it into dummy variables (male and female).
I have to use neural network (nnet) to classify. I have two options -
Include any N-1 dummy variables in the input data (e.g. include either male or female). In statistical models, we use N-1 dummy variables.
Include all N dummy variables (e.g. include both male and female)
Can someone please highlight the pros and cons of both options in predictive power and interpretability
If you have a bias term in the model, the best bet would be to use all but one. Otherwise, it induces a linear dependency in the predictor matrix and this can/will cause numerical issues.
These models are not directly explainable (with 1+ hidden units) so the choice doesn't matter from that point of view.