[R] vectorizing ANOVA over a vectorized linear model

hadley wickham h.wickham at gmail.com
Mon Mar 8 07:38:18 CET 2010


Hi Mark,

Unless you are fitting millions of very very very simple models, I
doubt that extracting p-values is going to be a limiting factor in the
speed of your analysis.

Hadley

On Mon, Mar 8, 2010 at 3:47 AM, Mark Kimpel <mwkimpel at gmail.com> wrote:
> Hadley,
>
> Thanks for pointing me to some good articles. Unfortunately, I have already
> read Holger's and my main concern is computational efficiency. The buzzword
> on this list regarding efficient code is "vectorization". I am, frankly,
> surprised that there is a way to vectorize analysis of complex models but
> not to extract p values from them. Dieter's reply points one towards using
> lapply, which in my experience allows for compact code but not an increase
> in efficiency (one of Holger's examples demonstrates this). Anyway, I cannot
> see how to go from Holger's fairly simple examples to one that involves a
> complex model with several factors and interactions.
>
> Limma, which does provide p values if contrasts are used, is blindingly fast
> but I believe Gordon Smyth has hard-coded most of this excellent package in
> C. I was hoping to achieve something similar without the use of the
> moderated t-statistics that Limma uses.
>
> Looks like I am stuck using loops with mcapply. Thank goodness for my
> Corei7!
>
> Mark
>
> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN  46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 399-1219 Skype No Voicemail please
>
>
> On Sun, Mar 7, 2010 at 2:08 PM, hadley wickham <h.wickham at gmail.com> wrote:
>>
>> Hi Mark,
>>
>> If efficiency is a concern you might want to read "Computing Thousands
>> of Test Statistics Simultaneously in R" by Holger Schwender and Tina
>> Müller, http://stat-computing.org/newsletter/issues/scgn-18-1.pdf.
>>
>> If you just want to do it, see the examples in
>> http://had.co.nz/plyr/plyr-intro-090510.pdf.
>>
>> Hadley
>>
>> On Sun, Mar 7, 2010 at 7:03 PM, Mark Kimpel <mwkimpel at gmail.com> wrote:
>> > Is it possible to vectorize anova over the output of a vectorized lm?  I
>> > have a gene expression matrix with each row being a gene and columns for
>> > samples. There are several factors with interactions. I can get p values
>> > by
>> > looping over the matrix with lm and anova, but I would like to make this
>> > as
>> > computationally efficient as possible. I am able to vectorize the lm
>> > command, but when I try to use anova on the resultant model object I get
>> > just one anova result.
>> >
>> > Is what I want to do possible? And, yes, I am quite conversant with
>> > Limma
>> > and other BioC packages, I have my reasons for wanting to use lm and
>> > anova.
>> >
>> > Thanks,
>> >
>> > Mark
>> > Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>> > Indiana University School of Medicine
>> >
>> > 15032 Hunter Court, Westfield, IN  46074
>> >
>> > (317) 490-5129 Work, & Mobile & VoiceMail
>> > (317) 399-1219 Skype No Voicemail please
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Assistant Professor / Dobelman Family Junior Chair
>> Department of Statistics / Rice University
>> http://had.co.nz/
>
>



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/



More information about the R-help mailing list