[BioC] combining GSA and lmFit

Dick Beyer dbeyer at u.washington.edu
Tue Apr 28 22:23:24 CEST 2009


Hi Gordon,

Thanks for sharing your views on this topic.

I was wondering, when you say "Thirdly, GSA computes p-values from permuation,
and permutation does not perform well for linear models," what is it about the
permuation approach that does not perform very well?  Is it due to the null
hypothesis being equality of distributions rather than assuming your are testing equality of means?

I see that the Bioconductor package GSEAlm which uses linear models with GSEA and uses sample permutations might have similar problems as combining GSA and lmFit.  I guess I was thinking a GSA/lmFit combination would be OK because of people using GSEAlm.  But if the underlying null assumption is equality of distributions rather than a weaker null of equality of means, then that would be important to keep in mind when interpreting the resulting p-values.

I wonder if there would be a useful way to test the issues you raise about the effect on gene set analysis when using limma or SAM statistics which depend on ensembles of genes, as well as the effect of the moderated statistics of limma or SAM on the GSA standardization method. I'm not sure I understand the GSA steps enough to know if those are designed to take care of problems you might otherwise use a SAM-type statistic to deal with.

Lots to think about.  Thanks very much for your comments.

Cheers,
Dick
*******************************************************************************
Richard P. Beyer, Ph.D.	University of Washington
Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
			Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer
*******************************************************************************

On Tue, 28 Apr 2009, Gordon K Smyth wrote:

> Dear Dick,
>
> Anything in GSA which works with the SAM statistic should also work fine
> with limma moderated t-statistics.
>
> However there are several issues that come to my mind which affect both
> statistics.  Firstly, both SAM and limma statistics depend on the whole
> ensemble of genes, i.e., they are not merely computed genewise.  This is
> unlike the floored mean statistics assumed in the GSA theory paper.  This
> has clear computational implications, but also could give rise to some
> theoretical issues.
>
> Secondly, it's not too clear to me whether it makes sense to compute
> regularized or moderated statistics after the standardization steps that GSA
> does.
>
> Thirdly, GSA computes p-values from permuation, and permutation does not
> perform well for linear models.
>
> These are simply my thoughts, which you asked for.  You may have ways around
> all these issues.
>
> Best wishes
> Gordon
>
>> Date: Sun, 26 Apr 2009 20:43:28 -0700 (PDT)
>> From: Dick Beyer <dbeyer at u.washington.edu>
>> Subject: [BioC] combining GSA and lmFit
>> To: Bioconductor <bioconductor at stat.math.ethz.ch>
>>
>> Hi All,
>>
>> I have extended the GSA code (http://www-stat.stanford.edu/~tibs/GSA/) to
>> include lmFit() from the limma package so as to have linear model
>> capabilities with GSA.  Basically, I'm using the modified t-statistic
>> values from lmFit just like the SAM-like t-statistic values are used in
>> the GSA code.
>>
>> I was wondering if anyone had any thoughts on whether this was, in
>> principle, an OK thing to be doing.  I am worrying about whether there is
>> an underlying issue I'm not aware of in using the moderated t-statistic
>> values from limma as opposed to the SAM t-statistic values that uses the
>> s0 term in the denominator.
>>
>> My tests on some microarray data I have shows that in a qqplot of
>> t-statistic values from the two methods, they are in pretty close
>> agreement except for large values of the t-values.
>>
>> If anyone knows of reasons not to be doing this or could point me to
>> places with possible explanations, I'd be very grateful.
>>
>> Cheers,
>> Dick
>>
>> *******************************************************************************
>> Richard P. Beyer, Ph.D.	University of Washington
>> Tel.:(206) 616 7378	Env. & Occ. Health Sci. , Box 354695
>> Fax: (206) 685 4696	4225 Roosevelt Way NE, # 100
>> 			Seattle, WA 98105-6099
>> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
>> http://staff.washington.edu/~dbeyer
>



More information about the Bioconductor mailing list