[R] Re: Enduring LME confusion… or Psychologists and Mixed-Effects

Tue Aug 10 17:55:20 CEST 2004

Hello,

>>
>> Suppose I have a typical psychological experiment that is a 
>> within-subjects design with multiple crossed variables and a 
>> continuous response variable. Subjects are considered a random 
>> effect. So I could model
>> > aov1 <- aov(resp~fact1*fact2+Error(subj/(fact1*fact2))
>>
>> However, this only holds for orthogonal designs with equal numbers of 
>> observation and no missing values. These assumptions are easily 
>> violated so I seek refuge in fitting a mixed-effects model with the 
>> nlme library.
>

I suppose that you have, for each subject, enough observations to 
compute his/her average response for each combination of factor1 and 
factor2, no?
If this is the case, you can perform the analysis with the above formula 
on the data obtained by 'aggregate(resp,list(subj,fact1,fact2),mean)'.

This is an analysis with only *within-subject* factors and there 
*cannot* be a problem of unequal number of observation when you have 
only within-subject factors (supposing you have at least one 
observations for each subject in each condition).

I believe the problem with unequal number of observations only  occurs 
when you have at least two crossed *between-subject* (group) variables.

Let's imagine you have two binary group factors (A and B) yielding four 
subgroups of subjects, and for some reason, you do have the same number 
of observations in each subgroup,
Then there are several ways of defining the main effects of A and B.

In many cases, the most reasonable definition of the main effect of A is 
to take the average of A in B1 and in B2 (thus ignoring the number of 
observations, or weithting equally the four subgroups).
To test the null hypothesis of no difference in A when all groups are 
equally weighted, one common approach in psychology is to pretend that 
the number of observation is each group is equal to the harmonic mean of 
the number of observations in each subgroups. The sums of square thud 
obtained can be compared with the error sum of square in the standard 
anova to form an F-test.
This is called the "unweighted" approach.

This can easily be done 'by hand' in R, but there is another approach:

You get equivalent statistics as in the unweighted anova when you use so 
called 'type III' sums of square (I read this in Howell, 1987 
'Statistical methods in psychology',
and in John Fox book 'An R and S-plus companion to appied regression, p. 
140).

It is possible to get type III sums of square using John Fox 'car' library.

library(car)
contrasts(A)=contr.sum
contrasts(B)=contr.sum
Anova(aov(resp~A*B),type='III')

You can compute the equally weighted cell means defining the effect of A 
with, say:

with(aggregate(resp,list(a=a,b=b),mean),tapply(x,a,mean))

I have seen some people advise against using 'type III' sums of square 
but I do not know their rationale. The important thing, it seems to me, 
is to know
which null hypothesis is  tested in a given test. If indeed the  type 
III sums of square test the effect on equally weighted means, they seem 
okay to me
(when this is indeed the hypothesis I want to test). 

Sorry for not answering any of your questions about the use of 'lme' (I 
hope others will do), but I feel that 'lme' is not needed in the context 
of unequal cell frequencies.
(I am happy to be corrected if I am wrong). It seems to me that 'lme' is 
useful when some assumptions of standard anova are violate (e.g. with 
repeated measurements when the assumption of sphericity is false), or 
when you have several random factors.

Christophe Pallier
http://www.pallier.org