GLM Over-dispersion Testing



I am using the following piece of code to check for over-dispersion of my glm (generalized linear model). This piece of code is comparing the residual deviance with the degrees of freedom of the glm. If there is no over-dispersion these two values would be equal and the below piece of code would give a value of [1] 1. However, I am unlikely to generate a perfect model and so the code will give me a value [1] 0 - 1.

pchisq(model2$deviance, df=model2$df.residual, lower.tail=FALSE)

I was wondering above what value generated from the above piece of code above, is considered to be alright so the null hypothesis, that there is no over-dispersion, can be accepted.

eg. >0.01 or >0.05 for example. What value should I be aiming for?

Many thanks in advance.


Not a direct answer to your question, but these discussions on Cross Validated might be helpful to you:

More directly…

It sounds like you want to know if there's enough of an overdispersion problem that you need to do something about it, or if you can ignore the overdispersion as inconsequential? As far as I know, there isn't consensus on this question. Various heuristics have been proposed for determining "how bad" overdispersion is (for instance, such opinions can be found in McCullagh & Nelder 1989, Crawley 2002, and Anderson et al 1994). But some argue that you should always adjust for overdispersion, e.g. Young et al 1999:

Overdispersion should be accounted for in a GLIM analysis. Failure to do so in the presence of overdispersion results in type I error rates well above the nominal ones. When overdispersion is not present, the test for treatment effects is not negatively affected by considering overdispersion.

The unsatisfying conclusion that applies to so much of statistics: It's probably better to think about an overdispersion diagnostic as one piece of information among many that helps inform your choices about your analysis, rather than as a binary decision rule. Overdispersion can be a sign of structural problems with your model (wrong choice of link function, missing covariates, outliers, etc) — you wouldn't want to ignore potentially fixable problems just because the degree of overdispersion was under some arbitrary value.

Aside: accepting the null hypothesis…

I think it's worth being very precise with language when discussing the implications of null hypothesis significance testing (NHST) — it's easy to fall into common traps. I do not want to start an NHST battle here :sweat_smile: , but if you're going to get really technical about it, there's an argument to be made that no p-value exists that will let you accept a null hypothesis. E.g., from Pernet 2017:

Therefore, one can only reject the null hypothesis if the test statistics falls into the critical region(s), or fail to reject this hypothesis. In the latter case, all we can say is that no significant effect was observed, but one cannot conclude that the null hypothesis is true. This is another common mistake in using NHST: there is a profound difference between accepting the null hypothesis and simply failing to reject it (Killeen, 2005). By failing to reject, we simply continue to assume that H0 is true, which implies that one cannot argue against a theory from a non-significant result (absence of evidence is not evidence of absence).

See also: