Hi,

I want to hear if someone have tried to use one of the method for justify causality in the context of an ordered logistic or ordered probit regression? For estimating an ordered logit I use the polr function from the MASS package. But I don't know if it's possible to use one of the causality method with polr, if so how.

It could be instrumental variable iv, regression discontinuity design rdd, matching or diff-in-diff. Or if you could refer to any kind of source it would also really help.

I would really appreciate any kind of answer!!

Analyzing causality can't be done independently of domain knowledge to permit teasing out variables that have independent effect from those that are masked ("confounded") by others. The tool to do this is directed acyclic graphs, and the {dag} package does this. A popular explanation is given in *The Book of Why* by Judea Pearl, whose classic text *Causality: Models, Reasoning and Inference* provides a technical exposition.

Thanks for your comment! I actually never heard about DAG or mediation before. So lets say I want to estimate the effect of working hours on self-reported health then I can use this package? To be honest I actually didn't understooth what mediation exactly is and for which purpose but I will try to look much more into it.

Note:

health is scaled 0-10

working hours is the difference between actual hours and preferred hours. Based on this I created two dummy for working more than prefered and working less than preferred. these two dummies is what I want to estimate on self-reported health

Following along from what @technocrat said, you need additional information to estimate a causal model. There is no statistical technique that will solve the problem without more information. For example, you could do an estimate using instrumental variables but you would have to have an instrument correlated with working hours (easy to find) that is not correlated with the error in the equation (very hard to find).

Correlation does not imply causation

is something you see a lot, because all sorts of things that are correlated have nothing else in common. Some examples

It *is* possible to measure **association** between the variables excess/deficit hours worked and the health self-rating if that's all you have. If you have additional variables, such as gender, age, education, type of work, etc., those potentially need to be controlled. To take a recent example

A nationwide rail strike was threatened over working conditions. As reported, the primary complaint was unpredictable working hours --subject to being called to work with no notice--and being unable to take time off for things like medical appointments unless scheduled a year in advance. This type of job is a *confounder* because inability to schedule healthcare may well lead to reported lower scores, even though the *number* of hours above/below preferred is not great.

It's easier to imagine other situationsâ€”the young, healthy investment banker able to work insane hours but getting insane money.

You would need to control for those situations by taking them out of the pool and running separately, if there are a sufficient number.

If these concepts seem unclear, an introductory text on statistics could be useful for you. Dalgaard is a good choice.