Tuesday, November 28, 2006

planned comparisons

Hi Nicola,
you told me it's data on selection experiments. Are the data rally three blocks, meaning crosses between selection lines started from three different base populations?

If you write it like this:
model<-aov(egg~block+block:cross)
there there are no main effects for cross, and you only want to compare crosses within blocks. That means that you only want contrasts between different groups in the same block.

To compare these groups, first look at the coefficients of the model, and what they represent.
Reformulating the model like this probably makes comparisons easier:
model<-aov(egg~block:cross)
because all estimates are means per level.

With four crosses per block, are the coefficients of block 1, the first four coefficients in the model?
Say, you want to make simultaneous confidence intervals for all pairwise comparisons for this block:

# contrast matrix

cmatrix<-matrix(rep(0,6*4),nrow=6)
cmatrix[1,1]<-1;cmatrix[1,2]<--1;
cmatrix[2,1]<-1;cmatrix[2,3]<--1;
cmatrix[3,1]<-1;cmatrix[3,4]<--1;
cmatrix[4,2]<-1;cmatrix[4,3]<--1;
cmatrix[5,2]<-1;cmatrix[5,4]<--1;
cmatrix[6,3]<-1;cmatrix[6,4]<--1;

cmatrix

# when this matrix is multiplied with the vector of coefficients,
# every element of the result, is an estimated difference between two means.

# library(multcomp)

# psubset selects the parameters of the block you want to investigate.

simint(model,psubset=1:4,cmatrix=cmatrix)

You should probably make a contrast matrix for all 18 pairwise comparisons simultaneously.

cmatrix<-matrix(rep(0,3*6*4*3),nrow=3*6)
cmatrix[1,1]<-1;cmatrix[1,2]<--1;
cmatrix[2,1]<-1;cmatrix[2,3]<--1;
cmatrix[3,1]<-1;cmatrix[3,4]<--1;
cmatrix[4,2]<-1;cmatrix[4,3]<--1;
cmatrix[5,2]<-1;cmatrix[5,4]<--1;
cmatrix[6,3]<-1;cmatrix[6,4]<--1;

cmatrix[7,5]<-1;cmatrix[7,6]<--1;
cmatrix[8,5]<-1;cmatrix[8,7]<--1;
cmatrix[9,5]<-1;cmatrix[9,8]<--1;
cmatrix[10,6]<-1;cmatrix[10,7]<--1;
cmatrix[11,6]<-1;cmatrix[11,8]<--1;
cmatrix[12,7]<-1;cmatrix[12,8]<--1;

cmatrix[13,9]<-1;cmatrix[13,10]<--1;
cmatrix[14,9]<-1;cmatrix[14,11]<--1;
cmatrix[15,9]<-1;cmatrix[15,12]<--1;
cmatrix[16,10]<-1;cmatrix[16,11]<--1;
cmatrix[17,10]<-1;cmatrix[17,12]<--1;
cmatrix[18,11]<-1;cmatrix[18,12]<--1;

simint(model,psubset=1:12,cmatrix=cmatrix)

Wednesday, November 15, 2006

Multinomial

Hi Tessa,


Answers:

1) how do I define my distribution in a glm when I have multinomial data (5 patches)?

Well, the glm() function cannot handle multinomial data, but function multinom() from the "nnet" library can.
Another package which fits a multinomial probit model is "MNP", there is also the "multinomrob" library. I think multinom() is easiest to use.


2) the same question but for a generalized linear mixed model (I would like to include time as a factor)?

When you need to add a random effect, "MNP" is the only option in R as far as I can see. Things become complicated in that case. Research on multinomial random effects models is still ongoing.
Some references: Multinomial logit random effects models, Statistical Modelling (2001) (J. Hartzel, A. Agresti, and B. Caffo); The analysis of ordered categorical data: An overview and a survey of recent developments, invited discussion paper for the Spanish Statistical Journal, TEST (2005) (I. Liu and A. Agresti).

3) In Crawley they talk about GAM’s (general additive models). What I understand is that for these models you don’t need to define a distribution because they use non parametric smoothers. Can these models be used for multinomial data?

Generalized additive models use non-parametric smoothers to model a non-linear dependence of a response variable on a predictor variable.
The distribution of the response variable is among the list of usual suspects: binomial, poisson, normal, gamma.
It has been claimed that package "gbm" (R) or "MART" (S-Plus) can do this for multinomial data, but I haven't tried those yet.

4) Spatial distributions are difficult. Package "spatstat" could contain the solution. I am not very familiar with analysing spatial patterns.

Wednesday, November 08, 2006

random effects

Hi Fransesc,
you seem to have only a small number of levels for the plot effects.
The consequence will be that estimates of variance between these random effects are very imprecise, and that it is not very solid to state that these levels are a representative sample from a large population of plot levels.
I would fit all effects as fixed.
Continuous variables such as time cannot define levels random effects, but they can appear in random regressions. Anyway, if you want to fit higher-order models of time, all you have to do is make new variables, like
timesq<-time^2
and add them to the model.
In a model with interactions of these time variables with treatment (or other factors), different polynomials of time will be fit for each treatment group.
Hope that helps,
Tom

Monday, November 06, 2006

variable lengths

Hi Cas,
something has gone wrong with attaching the data, or calculating new variables from old ones. The error message you get means that different variables in your model have different lenghts (so there's a mismatch).
for example, for a model of the form
glm(y~x+z)
you can check the lengths of x,z and y by
length(y)
length(x)
length(z)
The numbers returned should all be the same for a correct model specification. That makes it easy to spot where things went wrong.
good luck, Tom