[R] multistratum glm?
pallier at lscp.ehess.fr
Sun Apr 18 13:47:02 CEST 2004
I routinely use aov and and the Error term to perform analyses of
variance of experiments with 'within-subject' factors. I wonder whether
a notion like 'multistratum models' exists for glm models when
performing a logit analysis (without being 100% sure whether this would
I have data of an experiment where the outcome is a categorical variable:
20 individuals listened to 80 synthetic utterances (distributed in 4
types) and were ask classify them into four categories. (The variables
in the data.frame are 'subject', 'sentence', 'type', and 'response')
Here is the table of counts table(type,response):
type a b c d
a 181 166 42 11
b 69 170 72 89
c 90 174 75 61
d 14 125 53 208
There are several questions of interest, such as, for example:
- are responses distibuted in the same way for the different types?
- are the numbers of 'a' responses for the 'b' and 'c' types
- is the proportion of 'd' over 'a' responses different for the 'b' and
(I want to make inferences for the population of potential subjects on
the one hand, and on the population of potential sentences on the other
If the responses were continuous, I would just run two one-way anovas:
one with the factor type over the means by subject*type,
and the other with the factor type over the means by sentences (in
type). And use t.test to compare between different pairs of types.
Now, as the answers are categorical, I am not sure about the correct
approach and how to use R to perform such an analysis.
I could treat response as a factor, and use percentages of responses per
subject in each cell of response*type,
and run an anova on that...[
aov(percentage~response*type+Error(subject/(response*type))] But it
seems incorrect to me to use the response of the subject as an
independent variable (though I do not have a forceful argument).
Simple Chi-square tests are not the answer either, as a given subject
contributed several times (80) to the counts in the table above.
My reading of MASS and of several other books suggest the use of
logit/multinomial models when the response is categorical. But in all
the examples provided, the units of analysis contribute only one
measurement. Should I include the subject and sentences factors in the
formula? But then they would be treated as fixed-factors in the
analysis, would they not?
Any suggestion is welcome.
More information about the R-help