[BioC] enrichment packages that accept t-stat (or related stat) as input

Gordon K Smyth smyth at wehi.EDU.AU
Fri Oct 25 01:55:57 CEST 2013


Dear Pekka,

Thanks for your comments and for your interest in the camera procedure.

In general, it is not possible to run the camera() or roast() procedures 
on pre-computed test statistics.  Both functions need to have all the 
expression data in order to estimate the inter-gene correlations.

I agree that the same set of VIF would apply to different comparisons for 
the same linear model, so in principle we could store the VIF to allow 
them to be reused for different contrasts for the same data.  The syntax 
for that would require quite a bit of thought however and you are the 
first to ask for it.  The camera() function is very fast as it is, so we 
are not planning to make such a modification in the very near term.

It may be possible to generalize camera() to F-statistics, but it will be 
a serious mathematical research project to work out the appropriate VIF 
and test modifications.  You won't be able to do that in a valid way 
simply by hacking the code.

Regarding write.fit(), you could simply use

   out <- camera(...)
   write.table(out, file="cameraresults.txt")

Would that satisfy your needs?

In recent limma releases, we have modified mroast() to give output as a 
data.frame, so that the format is more like that from camera(), as part of 
our continuing development of gene set methods.

Best wishes
Gordon


-------------- original message ---------------
[BioC] enrichment packages that accept t-stat (or related stat) as input
Pekka Kohonen pkpekka at gmail.com
Thu Oct 24 12:00:46 CEST 2013

Dear Juliet, Gordon,

I am also looking into using pre-computed camera statistics, both to speed 
up computation for a webservice and also to enable statistics, such as 
F-statistic to be used that are not currently supported by the 
limma/camera package (AFAIK). So I am trying to de-compose the 
limma/camera-function to be able to make use of pre-computed statistics. I 
wonder if someone has already done so? Could the F-statistic (as estimated 
by the write.fit function for instance) be used in camera directly, or are 
there some statistical assumptions that are violated? Probably using the 
rank-based version is the safest option.

It seems to me that in order to use as much as possible pre-computed 
statistics in limma (when the gene sets are not known in advance) you can 
pre-compute the limma/ebayes gene wise statistics and array weights. But 
you have to still estimate the variance inflation factor for each gene 
set. But the same factor can be used for all the comparisons in the linear 
model.

It would be nice to have a "write.fit" type function for the gene-set 
tests as well. It is one of my favorite functions in limma.

I have used GSVA to perform linear modelling for gene set testing as well, 
but don't completely trust the statistical validity of the results. Maybe 
setting the trend=TRUE would alleviate some considerations about 
assumptions about normality being violated. Also it needs at least 10 
samples (apparently) to estimate the distribution of gene set statistics. 
But that is OK for dose-response modelling.

Thank you Gordon for your work on the limma! I am also finding the "voom" 
to be a really nice function and have used it to analyze laber-free 
proteomics experimetns as well.

Best Regards,

Pekka

2013/8/30 Gordon K Smyth <smyth at wehi.edu.au>:
> Dear Juliet,
>
> Why not use the enrichment functions that are already part of the limma
> package?   See
>
>   ?roast
>   ?camera
>   ?romer
>
> and references there-in.
>
> Best wishes
> Gordon
>
>
>> Message: 19
>> Date: Thu, 29 Aug 2013 20:43:04 -0400
>> From: Juliet Hannah <juliet.hannah at gmail.com>
>> To: Robert Castelo <robert.castelo at upf.edu>
>> Cc: Bioconductor mailing list <bioconductor at r-project.org>
>> Subject: Re: [BioC] enrichment packages that accept t-stat (or related
>>         stat) as input
>>
>> Hi Robert,
>>
>> Thanks for your response. I will look into it.
>>
>> Also is it correct GSVA always requires an expression matrix. It seems 
>> that it integrates with limma, so if I have done an analysis in limma 
>> does this mean that I should be able to use GSVA for an enrichment 
>> analysis.
>>
>> Thanks,
>>
>> Juliet
>>
>>
>> On Thu, Aug 29, 2013 at 2:43 AM, Robert Castelo
>> <robert.castelo at upf.edu>wrote:
>>
>>> Juliet,
>>>
>>> i think the first 5 pages in the vignette entitled "Using Categories 
to
>>> Analyze Microarray Data" from the Category package:
>>>
>>>
>>> 
http://www.bioconductor.org/**packages/release/bioc/html/**Category.html<http://www.bioconductor.org/packages/release/bioc/html/Category.html>
>>>
>>> may be doing what you are looking for.
>>>
>>> cheers,
>>> robert.
>>>
>>>
>>> On 08/28/2013 08:04 PM, Juliet Hannah wrote:
>>>
>>>> All,
>>>>
>>>> I am looking for an Bioconductor enrichment package that does 
>>>> something similar to GSEA for pre-computed test statistics. This 
>>>> method would not rely on a cutoff. That is, rather than passing an 
>>>> expression matrix, one can compute summarizes outside of the package 
>>>> (such as a limma t), and then pass these. Any suggestions?
>>>>
>>>> Thanks,
>>>>
>>>> Juliet
>>>>
>>>>
>>> --
>>> Robert Castelo, PhD
>>> Associate Professor
>>> Dept. of Experimental and Health Sciences
>>> Universitat Pompeu Fabra (UPF)
>>> Barcelona Biomedical Research Park (PRBB)
>>> Dr Aiguader 88
>>> E-08003 Barcelona, Spain
>>> telf: +34.933.160.514
>>> fax: +34.933.160.550

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list