[BioC] Siggenes and interpreting SAM output

Ettinger, Nicholas nicholas-ettinger at uiowa.edu
Wed Jan 18 19:04:38 CET 2006

Hello all!

I am having trouble interpreting whether my SAM outputs are valid enough
that I should take them seriously, or whether to ignore them and try
another method.

Here is a sample of some of my output (samples are two-class paired
samples; infected cells vs. non-infected cells with 3 different human
donors for the cells; normalized with GCRMA):

Gene set#1
		D-value 	Q-value	R-fold 
Gene 1	-179.854	0.410743	0.598795 
Gene 2	-84.0229	0.417071	0.775385 
Gene 3	-82.5916	0.417071	0.858212 

Gene set#2
		D-value 	Q-value	R-fold
Gene 4	86.7573	0.152039	1.00977 
Gene 5	86.3523	0.152039	1.09908 
Gene 6	-83.4252	0.152039	0.547529

How do I think about these results?

I have several questions:
(1) I am not too clear why the D-values are so high/low but the R-fold
numbers are not bigger/smaller.  I realize that the D-values are
generated from the obs d(i) vs. the expect d(i) but I thought that this
kind of related to the fold change?

(2) For gene set #1, would the correct interpretation be that these
genes are changing by "large" amounts but that since the q-values are so
high (the FDR was around 0.4), they are not reliable?

(3) Similarly for gene set #2, would the correct interpretation be that
since the q-values are much lower (the FDR was about 0.2), one could be
more confident that these are real changes that are being picked up?
And if you were comfortable with a 20% chance that the genes you chose
were falsely positive then you could move on to verify them?

(4) What is being accepted for publication in terms of FDRs and q-values
and such?  It seems to me that that is really the defining answer.  How
low do the FDRs and q-values have to be before editors of journals will
take the results seriously in people's experience??  Suggestions and/or
advice here would be most welcome.

Thank you!!
University of Iowa

