[R] multistratum glm?

Sun Apr 18 13:47:02 CEST 2004

Hello,

I routinely use aov and and the Error term to perform analyses of 
variance of experiments with 'within-subject' factors. I wonder whether 
a notion like 'multistratum models' exists for glm models when 
performing a logit analysis (without being 100%  sure whether this would 
make sense).

I have data of an experiment where the outcome is a categorical variable:

20 individuals listened to 80 synthetic utterances (distributed in 4 
types) and were ask classify them into four categories. (The variables 
in the data.frame are 'subject', 'sentence', 'type', and 'response')

Here is the table of counts table(type,response):

       response
type  a   b  c   d
  a 181 166 42  11
  b  69 170 72  89
  c  90 174 75  61
  d  14 125 53 208

There are several questions of interest, such as, for example:

- are responses distibuted in the same way for the different types?

- are the numbers of 'a' responses for the 'b' and 'c' types 
significantly different?

- is the proportion of 'd' over 'a' responses different for the 'b' and 
'c'  categories?

...  

(I want to make inferences for the population of potential subjects on 
the one hand, and on the population of potential sentences on the other 
hand).

If the responses were continuous, I would just run two one-way anovas: 
one with the factor type over the means by subject*type,
and the other with the factor type over the means by sentences (in 
type). And use t.test to compare between different pairs of types.

Now, as the answers are categorical, I am not sure about the correct 
approach and how to use R to perform such an analysis.

I could treat response as a factor, and use percentages of responses per 
subject in each cell of response*type,
and run an anova on that...[ 
aov(percentage~response*type+Error(subject/(response*type))] But it 
seems incorrect to me to use the response of the subject as an 
independent variable (though I do not have a forceful argument).

Simple Chi-square tests are not the answer either, as a given subject 
contributed several times (80) to the counts in the table above.

My reading of MASS and of several other books suggest the use of 
logit/multinomial models when the response is categorical. But in all 
the examples provided, the units of analysis contribute only one 
measurement. Should I include the subject and sentences factors in the 
formula? But then they would be treated as fixed-factors in the 
analysis, would they not?

Any suggestion is welcome.

Christophe Pallier
www.pallier.org