Monday, October 30, 2006

mixed model

Hi Barbara,
I guess you want to fit a linear mixed model, nlme is specific for non-linear models.

Did you try this code, which assumes a normally distributed response:

model1<-lme(mfreq~avergTemp+altitude, random=~1|country)

or this code for poisson response:

model2<-glmmPQL(mfreq~avergTemp+altitude, random=~1|country, family=poisson)

lmer(mfreq~avergTemp+altitude+(1|country), family=poisson, method="Laplace")

but first you should check whether country is a factor!

Cheers, Tom
Hi Tom,
I'm having troubel getting my mixed model to run. The dependent variable gives frequencies, the fixed variables are numbers and the random variable consists of categories:
model1<-nlme(mfreq~avergTemp+altitude+country, fixed=avergTemp+altitude~1,
but then I get the error message:
Error in, eval(parse(text = paste("~1",deparse(groups[[2]]), : Invalid formula for groups.
Unfortunatly I can neither figure out with Crawly nor with help-files what sort of parameter or function I need in this case for the random variable. Any suggestions?
Thanks a lot,Barbara

count data and dispersion

First of all,
long ago I picked up the idea that corrections for overdispersion are only necessary when the scale parameters is above 3 to 4. The idea is understandable, you should only correct if likely to be present, but the exact reason for the "3 to 4" limit is now unclear to me.
I tried to locate where I got the idea from, and it seems to be from here. I did not find any confirmation of that rule of thumb anywhere else.
However, Lindsey (1999) suggests, based on the analysis of examples, that corrections are necessary when the overdispersion parameter is at least two.

You should keep in mind, however, that overdispersion is impossible in some cases (so you should then never correct for it):

- when the dependent variable is a Bernoulli 0 - 1 variable
- when the maximal model considered, is the saturated model

Overdispersion can also be caused by

- using the wrong link function
- a missing covariate
- the necessity to transform covariates
- outliers

so you should check these potential causes.

Concerning underdispersion, Venables and Ripley (2002) show that dispersion estimates indicating underdispersion can be caused by small counts. See e.g. MASS4 p.208.
The ratio of residual deviance to degrees of freedom can be seriously biased downward, for extreme p (and small n) in the case of binomial data, and for low lambda.

Instead of relying on rules of thumb, one can also model distributions for counts which allow both under-and overdispersion.
Some possibilities are the double binomial (poisson) distribution and the multiplicative binomial (poisson) distribution. Using AIC, you can then incorporate the decision to correct for under(over)dispersion into your model selection procedure.
You can do glm with these distributions in R, using the "gnlm" library by (Jim Lindsey. It is not easy to work with.
I have adapted some examples from Lindsey 1999 Models for Repeated Measurements to indicate how you can fit such models.
The R input code is here, and the data file here.

Tuesday, October 24, 2006

count data and underdispersion

Hi Tom!
I have two questions;
1) If you have count data with underdispersion, from which value of the dispersion parameter should you chose to use the quasipoisson distribution? Is there a rule of thumb for underdispersion, like with overdispersion?
2) If you have countdata with a high mean (17) and 'good' variance on the left but a short tail on the right: Is it enough to use the quasipoisson distribution to correct for this, so can you use the quassipoisson distribution when there is only underdisperion on one side? Or should you transform your data and use the normal distribution?

Monday, October 16, 2006

GLM course

The glm course has started this morning. Thomas Tully, Wolf Mooij and Tom Van Dooren are the instructors. All participants have been invited to become members of this blog.