[R] Is there a fast way to do several hundred thousand ANOVA tests?

glen_b glnbrntt at gmail.com
Mon Aug 24 03:40:59 CEST 2009



The usual methods of avoiding loops may provide some speedup (but you've
already done the allocation of the full results vector outside the loop,
which is probably a major saving) - others may have more detailed advice on
that score.

Within the loop there are some speedups possible.

update() will allow you to avoid constantly recomputing the things based off
the design matrix, which doesn't look like it's changing.

see ?update

If you're just interested in the F-statistic, more direct approaches that
avoid all the other calculation done by aov or lm may be substantially
faster; you can work directly with the QR or choleski decomposition (done
outside the loop/"apply" part), for example. Faster (but less numerically
stable) methods (e.g. involving use of a SWEEP operator) exist; if your
ANOVA is balanced, then things might be done even more directly.



big permie wrote:
> 
> Dear R users,
> 
> I have a matrix a and a classification vector b such that
> 
>> str(a)
> num [1:50, 1:800000]
> and
>> str(b)
> Factor w/ 3 levels "cond1","cond2","cond3"
> 
> I'd like to do an anova on all 800000 columns and record the F statistic
> for
> each test; I currently do this using
> 
> f.stat.vec <- numeric(length(a[1,])
> 
> for (i in 1:length(a[1,]) {
>   f.test.frame <- data.frame(nums = a[,i], cond = b)
>   aov.vox <- aov(nums ~ cond, data = f.test.frame)
>   f.stat <- summary(aov.vox)[[1]][1,4]
>   f.stat.vec[i] <- f.stat
> }
> 
> The problem is that this code takes about 70 minutes to run.
> 
> Is there a faster way to do an anova & record the F stat for each column?
> 
> Any help would be appreciated.
> 
> Thanks
> Heath
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-there-a-fast-way-to-do-several-hundred-thousand-ANOVA-tests--tp25109056p25109345.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list