Monday, October 30, 2006

mixed model

Hi Barbara,
I guess you want to fit a linear mixed model, nlme is specific for non-linear models.

Did you try this code, which assumes a normally distributed response:

model1<-lme(mfreq~avergTemp+altitude, random=~1|country)
summary(model1)

or this code for poisson response:

library(MASS)
model2<-glmmPQL(mfreq~avergTemp+altitude, random=~1|country, family=poisson)
summary(model2)

library(lme4)
lmer(mfreq~avergTemp+altitude+(1|country), family=poisson, method="Laplace")

but first you should check whether country is a factor!
summary(country)

Cheers, Tom

count data and dispersion

First of all,
long ago I picked up the idea that corrections for overdispersion are only necessary when the scale parameters is above 3 to 4. The idea is understandable, you should only correct if likely to be present, but the exact reason for the "3 to 4" limit is now unclear to me.
I tried to locate where I got the idea from, and it seems to be from here. I did not find any confirmation of that rule of thumb anywhere else.
However, Lindsey (1999) suggests, based on the analysis of examples, that corrections are necessary when the overdispersion parameter is at least two.

You should keep in mind, however, that overdispersion is impossible in some cases (so you should then never correct for it):

- when the dependent variable is a Bernoulli 0 - 1 variable
- when the maximal model considered, is the saturated model

Overdispersion can also be caused by

- using the wrong link function
- a missing covariate
- the necessity to transform covariates
- outliers

so you should check these potential causes.

Concerning underdispersion, Venables and Ripley (2002) show that dispersion estimates indicating underdispersion can be caused by small counts. See e.g. MASS4 p.208.
The ratio of residual deviance to degrees of freedom can be seriously biased downward, for extreme p (and small n) in the case of binomial data, and for low lambda.

Instead of relying on rules of thumb, one can also model distributions for counts which allow both under-and overdispersion.
Some possibilities are the double binomial (poisson) distribution and the multiplicative binomial (poisson) distribution. Using AIC, you can then incorporate the decision to correct for under(over)dispersion into your model selection procedure.
You can do glm with these distributions in R, using the "gnlm" library by (Jim Lindsey. It is not easy to work with.
I have adapted some examples from Lindsey 1999 Models for Repeated Measurements to indicate how you can fit such models.
The R input code is here, and the data file here.

Monday, October 16, 2006

GLM course

The glm course has started this morning. Thomas Tully, Wolf Mooij and Tom Van Dooren are the instructors. All participants have been invited to become members of this blog.
Tom