[R] anova test for variables with different lengths

S Ellison S.Ellison at LGCGroup.com
Tue Oct 16 12:45:31 CEST 2012


 

>> I want to test whether the MEAN of two different variables, 
>> (and different number of observations) are the same. I am 
>> trying to use the anova test but it doesn't seem to like that 
>> the number of observations are different:
>> 
>> a=c(1:5)
>> b=c(1:3)
>> aov_test=aov(a~b)
>> >>>>Error in model.frame.default(formula = a ~ b, 
>> drop.unused.levels = TRUE) :
>>   variable lengths differ (found for 'b')
>> 
> -----Original Message-----
> You may find this tutorial useful: 
> http://goanna.cs.rmit.edu.au/~fscholer/anova.php
> And you'll need the car package; but become yourself familiar 
> with Type I, II and III sums of squares models before running 
> the Anova; the tutorial explains these in detail.
> Hope it helps.

Sadly, I doubt that it will, though it would be good advice if the OP had got as far as formulating the model correctly. 

But they haven't. The OP has tried to model a variable of length 5 using a predictor of length 3. (In fact what they've just done is a simple linear regression of variables with different length). This will not work, no  matter what the OP does about types of SS. 

First, a t test would do this job, assuming normality - though incidentally the variances differ so the default t.test will return a somewhat different result to anova, which effectively assumes equal variance by default.

Second, to use aov correctly, read ?formula and look at the examples for this and ?lm

Then, if you want to get the same result as an equal variance t test using ANOVA, you'd have to concatenate the two groups and then model with a predictor indicating the groups. In this instance

y <- c(a, b)
g <- factor( rep( letters[1:2], c(length(a), length(b) ) ), )
summary( aov(y~g) )

Since this is a one way problem the type of SS won't matter, but in other cases it would be crucial to at least understand why - and to what extent - anova can be  unsafe* on unbalanced data.


S Ellison

*"unsafe" reads as "actively dangerous" in this context.

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}




More information about the R-help mailing list