## Thursday, November 30, 2006

### planned comparisons

Dag Tom,

het idee is om echt planned comparisons te doen, waardoor de error rate niet hoeft te worden aangepast (zoals wel gebeurt bij de simultaneous CI's methode).

Ik vond devolgende manier (op 't net) om planned comparisons te doen, wat denk je daarvan?;

cmat <- rbind("HL v LH" = c(0,1, -1, 0),"HH v LH/HL"=c(2,-1,-1,0), "LL v LH/HL"=c(0,-1,-1,2))
library(gmodels)
fit <-glm(ei2~kruising , contrasts=list("kruising"=make.contrasts(cmat)))
summary(fit)

groet
Nicola

## Tuesday, November 28, 2006

### planned comparisons

Hi Nicola,
you told me it's data on selection experiments. Are the data rally three blocks, meaning crosses between selection lines started from three different base populations?

If you write it like this:
model<-aov(egg~block+block:cross)
there there are no main effects for cross, and you only want to compare crosses within blocks. That means that you only want contrasts between different groups in the same block.

To compare these groups, first look at the coefficients of the model, and what they represent.
Reformulating the model like this probably makes comparisons easier:
model<-aov(egg~block:cross)
because all estimates are means per level.

With four crosses per block, are the coefficients of block 1, the first four coefficients in the model?
Say, you want to make simultaneous confidence intervals for all pairwise comparisons for this block:

# contrast matrix

cmatrix<-matrix(rep(0,6*4),nrow=6)
cmatrix[1,1]<-1;cmatrix[1,2]<--1;
cmatrix[2,1]<-1;cmatrix[2,3]<--1;
cmatrix[3,1]<-1;cmatrix[3,4]<--1;
cmatrix[4,2]<-1;cmatrix[4,3]<--1;
cmatrix[5,2]<-1;cmatrix[5,4]<--1;
cmatrix[6,3]<-1;cmatrix[6,4]<--1;

cmatrix

# when this matrix is multiplied with the vector of coefficients,
# every element of the result, is an estimated difference between two means.

# library(multcomp)

# psubset selects the parameters of the block you want to investigate.

simint(model,psubset=1:4,cmatrix=cmatrix)

You should probably make a contrast matrix for all 18 pairwise comparisons simultaneously.

cmatrix<-matrix(rep(0,3*6*4*3),nrow=3*6)
cmatrix[1,1]<-1;cmatrix[1,2]<--1;
cmatrix[2,1]<-1;cmatrix[2,3]<--1;
cmatrix[3,1]<-1;cmatrix[3,4]<--1;
cmatrix[4,2]<-1;cmatrix[4,3]<--1;
cmatrix[5,2]<-1;cmatrix[5,4]<--1;
cmatrix[6,3]<-1;cmatrix[6,4]<--1;

cmatrix[7,5]<-1;cmatrix[7,6]<--1;
cmatrix[8,5]<-1;cmatrix[8,7]<--1;
cmatrix[9,5]<-1;cmatrix[9,8]<--1;
cmatrix[10,6]<-1;cmatrix[10,7]<--1;
cmatrix[11,6]<-1;cmatrix[11,8]<--1;
cmatrix[12,7]<-1;cmatrix[12,8]<--1;

cmatrix[13,9]<-1;cmatrix[13,10]<--1;
cmatrix[14,9]<-1;cmatrix[14,11]<--1;
cmatrix[15,9]<-1;cmatrix[15,12]<--1;
cmatrix[16,10]<-1;cmatrix[16,11]<--1;
cmatrix[17,10]<-1;cmatrix[17,12]<--1;
cmatrix[18,11]<-1;cmatrix[18,12]<--1;

simint(model,psubset=1:12,cmatrix=cmatrix)

## Wednesday, November 22, 2006

### planned comparisons in a nested ANOVA

Hi Tom and others,

How do you perform planned comparisons within a nested ANOVA? I want to look at the differences between levels of the nested factor.

When trying to build the appropriate contrast coefficient, I couldn't make matrices which were built from a combination of 2 factors (so main factor:nested factor). Is this the right way forward at all? And if so, how? A bookreference would be fine too!
Or would it be easier to just do t-tests?
Also, in the booklet of the course it was mentioned you could "construct appropriate independent variables" for planned comparisons. Is this the same as constructing contrast coefficients?

cheers
Nicola

## Wednesday, November 15, 2006

### Multinomial

Hi Tessa,

1) how do I define my distribution in a glm when I have multinomial data (5 patches)?

Well, the glm() function cannot handle multinomial data, but function multinom() from the "nnet" library can.
Another package which fits a multinomial probit model is "MNP", there is also the "multinomrob" library. I think multinom() is easiest to use.

2) the same question but for a generalized linear mixed model (I would like to include time as a factor)?

When you need to add a random effect, "MNP" is the only option in R as far as I can see. Things become complicated in that case. Research on multinomial random effects models is still ongoing.
Some references: Multinomial logit random effects models, Statistical Modelling (2001) (J. Hartzel, A. Agresti, and B. Caffo); The analysis of ordered categorical data: An overview and a survey of recent developments, invited discussion paper for the Spanish Statistical Journal, TEST (2005) (I. Liu and A. Agresti).

3) In Crawley they talk about GAM’s (general additive models). What I understand is that for these models you don’t need to define a distribution because they use non parametric smoothers. Can these models be used for multinomial data?

Generalized additive models use non-parametric smoothers to model a non-linear dependence of a response variable on a predictor variable.
The distribution of the response variable is among the list of usual suspects: binomial, poisson, normal, gamma.
It has been claimed that package "gbm" (R) or "MART" (S-Plus) can do this for multinomial data, but I haven't tried those yet.

4) Spatial distributions are difficult. Package "spatstat" could contain the solution. I am not very familiar with analysing spatial patterns.

## Wednesday, November 08, 2006

### multinomial distribution

Hoi Tom and others

I have a couple of questions about how to analyse my data. The data are on spatial distributions in 2 predatory mite species. The mites were given a choice between five patches. I would like to test for differences in distribution when both species are present and compare it with a control where only one species is present.

Questions:

1) how do I define my distribution in a glm when I have multinomial data (5 patches)?

2) the same question but for a generalized linear mixed model (I would like to include time as a factor)?

3) In Crawley they talk about GAM’s (general additive models). What I understand is that for these models you don’t need to define a distribution because they use non parametric smoothers. Can these models be used for multinomial data?

4) The mites lay eggs during the experiment. I would like to know how the spatial distribution of the eggs, the eggs of the second species and the second species itself influence the spatial distribution of one mite. I thought about ancova, but my variables are not independent, so I suppose the correct way is to do several correlations? Or maybe multivariate analysis? I am also interested in interactions, so I wondered if it is possible to take this information in my model

Tessa

### random effects

Hi Fransesc,
you seem to have only a small number of levels for the plot effects.
The consequence will be that estimates of variance between these random effects are very imprecise, and that it is not very solid to state that these levels are a representative sample from a large population of plot levels.
I would fit all effects as fixed.
Continuous variables such as time cannot define levels random effects, but they can appear in random regressions. Anyway, if you want to fit higher-order models of time, all you have to do is make new variables, like
timesq<-time^2
and add them to the model.
In a model with interactions of these time variables with treatment (or other factors), different polynomials of time will be fit for each treatment group.
Hope that helps,
Tom

### mixed effects

Hi Tom, and the rest of the experts ;-)

I've been struggling quite a bit to fit a mixed effects model to my data. It is a RCB (or CR) design, with repeated measurements (RM) and, as said, both fixed and random effects. Treatment(fixed) = 2, plot (random) = 3 and nested in treatment. also there's replicate (random) = 2 and nested in plot, but this is the lowest level of variability, so I won't include this term.
Now here come the uncertainties: Time...I think is fixed...and treatment is also nested in time, because in the course of time the effect of the treatment seems to be different. The graph shows a large hump, which makes me suspect a higher-order term (time^2 or even time^3) . My problem is that I can't seem to find a relatively clear and simple explanation on how to deal with this. Crawley certainly doesn't treat it as it does the glm's. I reckon it goes a bit too far for a glm...
I hope I explained it clear enough...

greetz,
Francesc

## Monday, November 06, 2006

### variable lengths

Hi Cas,
something has gone wrong with attaching the data, or calculating new variables from old ones. The error message you get means that different variables in your model have different lenghts (so there's a mismatch).
for example, for a model of the form
glm(y~x+z)
you can check the lengths of x,z and y by
length(y)
length(x)
length(z)
The numbers returned should all be the same for a correct model specification. That makes it easy to spot where things went wrong.
good luck, Tom
Hi Tom and others of the GLM-R course,

I am trying to run a GLM, but get the error saying that "variable lenghts differ (found for 'density')".The variable ''density" simply is a number from 3 to 7. Do you know what I am doing wrong?

Cheers,

Cas Eikenaar