Wu, Xiwei XWu at coh.org
Tue Mar 29 18:35:50 CEST 2005

Thanks a lot, Naomi. Your result is very interesting. I am wondering whether
the number of DE genes in your simultion dataset will affect the results. 
I tested SAM and Limma using the same dataset (but without the knowledge of
what genes should be DE). I know this is not the best way to compare
different methods, but I just want to get some idea. At the level of 0.05
FDR, SAM finds a lot more DE genes than Limma. However, with some other
datasets, SAM and Limma perform similarly. In addition, I also found using
median FDR or mean FDR in SAM makes a big difference for some datasets, but
not for others. 
The message I got is that there is no common answer to this question,
because it depends on the datasets? Any comments?
In addition, is there a guideline for the minimum number of replicates
should be used with SAM? I assume that with small number of replicates, the
permutaion does not mean much.


-----Original Message-----
From: Naomi Altman [mailto:naomi at stat.psu.edu] 
Sent: Monday, March 28, 2005 8:34 PM
To: Wu, Xiwei; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] SAM vs LIMMA vs EBAM

I have not tried EBAM, but I did do this experiment with SAM and LIMMA on a
data set I simulated from an actual data set.

On these data, the SAM statistic and LIMMA F-test gave almost identical
ordering of the genes.  However, the FDR adjustment was too stringent for
SAM (i.e. the true FDR was lower than SAM's estimate) and was too liberal
for LIMMA.

This was not a big study.  I took my gene means and variances from an actual
study, and then added either normal or t-4 errors and a couple of levels of
differential expression.

The sample sizes I used were very small - 2 or 4 replicates with 22000
genes.  Results were much, much, much better with 4 replicates than with 2.


At 08:48 PM 3/28/2005, Wu, Xiwei wrote:
>Hi, BioC Members,
>I have a general question on identifying DE genes. Since there are many 
>ways to do this, I am wondering whether people has compared methods 
>such as SAM, EBAM, and LIMMA by applying them to the same dataset. Of 
>course, they have different assumptions and different models, but 
>should they always give similar results (assuming the parameter 
>settings are optimized to get similar number of DE genes)? Is it better 
>to get a common list of genes using three different methods? Do I have 
>more confidence on this common list of genes than using a single method?
>"EMF <COH.org>" made the following annotations.
>------- SECURITY/CONFIDENTIALITY WARNING:  This message and any 
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111

	[[alternative HTML version deleted]]

More information about the Bioconductor mailing list