Ordinal logistic regression or classification tree?

Hi there,

I am new to R and getting stuck on some points. I've tried posting in other communities and have trawled through the literature, but am finding conflicting texts and opinions. Any additional insight here would be really helpful.

I started off wanting to carry out ordinal logistic regression. I have an ordinal response variable with 3 levels, and ~15 IVs, most of which are categorical. Since 2 of my IVs violate the proportional odds assumption, I decided to use vglm() in the VGAM package, which allows the assumption to be relaxed for specified variables. I have also tried out clm() from the ordinal package. My first problem is that the coefficients produced from either package possess opposite signs. I can see that this can be addressed with the 'reverse=TRUE' argument in vglm(). However, I can't work out what this argument actually means - can anyone shed some light on this? How do I determine which way round the coefficient signs should be?

Further to that, I am aware that classification trees may be an alternative route to go down. Does the proportional odds/parallel lines assumption come into this? Why would you choose to use a decision tree regression over ordinal logistic regression?

Thanks in advance!

There is always a trade-off between explainability and ability of the model to predict as accurate as possible. With ordinal logistic regression you keep information about the fact that your response variable is ordinal and coefficients that you get can provide you with a way to check whether they make sense or not.

With decision trees I'm not sure if there is a way to predict an ordinal variable directly, but you can always treat it as a variable with 3 levels and predict each level directly. However, since it is non-parametric model you'll have harder time understanding what model does. Nevertheless, there are multiple things you can do to fact-check what your model is doing. With random forest you have variables importance that you can examine to make sure that model does something reasonable. There is also lime package that you can use to check why model predicts what it does.

Personally, I tend to use decision trees most of the time since they are simple conceptually and tend to provide good performance out of the box. If you don't care too much about performance, then maybe you should stick to more explainable models that you are using at the moment.

3 Likes

Thank you for your detailed reply. I have been looking into decision trees today. I think the best method for me right now will be to use a more explainable model (and because I understand these more at the moment). Will read up more on random forest etc.
Thanks!