Should I include all dummy variables or N-1 dummy variables (keep one as reference) in neural networks

I have a categorical variable with N factor levels (e.g. gender has two levels) in binary classification problem. I have converted it into dummy variables (male and female).

I have to use neural network (nnet) to classify. I have two options -

  1. Include any N-1 dummy variables in the input data (e.g. include either male or female). In statistical models, we use N-1 dummy variables.
  2. Include all N dummy variables (e.g. include both male and female)

Can someone please highlight the pros and cons of both options in predictive power and interpretability

If you have a bias term in the model, the best bet would be to use all but one. Otherwise, it induces a linear dependency in the predictor matrix and this can/will cause numerical issues.

These models are not directly explainable (with 1+ hidden units) so the choice doesn't matter from that point of view.

1 Like

Thank you very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.