Statistics Forum Functional Ecology Research School: October 2006

First of all,
long ago I picked up the idea that corrections for overdispersion are only necessary when the scale parameters is above 3 to 4. The idea is understandable, you should only correct if likely to be present, but the exact reason for the "3 to 4" limit is now unclear to me.
I tried to locate where I got the idea from, and it seems to be from here. I did not find any confirmation of that rule of thumb anywhere else.
However, Lindsey (1999) suggests, based on the analysis of examples, that corrections are necessary when the overdispersion parameter is at least two.

You should keep in mind, however, that overdispersion is impossible in some cases (so you should then never correct for it):

- when the dependent variable is a Bernoulli 0 - 1 variable
- when the maximal model considered, is the saturated model

Overdispersion can also be caused by

- using the wrong link function
- a missing covariate
- the necessity to transform covariates
- outliers

so you should check these potential causes.

Concerning underdispersion, Venables and Ripley (2002) show that dispersion estimates indicating underdispersion can be caused by small counts. See e.g. MASS4 p.208.
The ratio of residual deviance to degrees of freedom can be seriously biased downward, for extreme p (and small n) in the case of binomial data, and for low lambda.

Instead of relying on rules of thumb, one can also model distributions for counts which allow both under-and overdispersion.
Some possibilities are the double binomial (poisson) distribution and the multiplicative binomial (poisson) distribution. Using AIC, you can then incorporate the decision to correct for under(over)dispersion into your model selection procedure.
You can do glm with these distributions in R, using the "gnlm" library by (Jim Lindsey. It is not easy to work with.
I have adapted some examples from Lindsey 1999 Models for Repeated Measurements to indicate how you can fit such models.
The R input code is here, and the data file here.

Statistics Forum Functional Ecology Research School

Monday, October 30, 2006

mixed model

count data and dispersion

Monday, October 16, 2006

GLM course

Blog Archive

Links

Contributors

Followers