[BioC] siggenes parameters

Tue Jun 28 13:11:15 CEST 2011

Hi Assa,

siggenes is described in

http://cran.r-project.org/doc/Rnews/Rnews_2006-5.pdf

on pages 45ff, and with a bit more technical details in

https://eldorado.tu-dortmund.de/bitstream/2003/23306/1/diss_schwender.pdf

Section 6.3.2. Another possibility is to look into the code for sam, in particular into the functions d.stat and siggenes:::stats.cal.

The reason why the FDR is so low in the third example is the very, very low estimate for p0. Take a look at the number of falsely called genes (False) for about the same number of called genes. These numbers are close to each other. The FDR is estimated by

p0 * False/Called

so the FDR gets very low if p0 is very small. 

I would thus not trust the results of the third analysis, as the p0 estimation has not worked properly here. Not really sure why, but the reason might be that too many null genes are removed when use.dm=FALSE.

Best,
Holger

-------- Original-Nachricht --------
> Datum: Tue, 28 Jun 2011 09:41:53 +0200
> Von: Assa Yeroslaviz <frymor at gmail.com>
> An: bioconductor <bioconductor at stat.math.ethz.ch>, holger.schw at gmx.de
> Betreff: siggenes parameters

> Hi everybody,
> 
> I have a question about the behavior of siggenes in R (2.12) with
> different
> parameters.
> Maybe I need to understand better how siggenes calculated the
> differentially
> regulated genes, but I would like to ask you for your help in
> understanding
> it.
> 
> Maybe you can also direct me to where I can find the answers for this
> problem(s)
> 
> I am running a sam analysis of wild type vs. mutant in drosophila miRNA
> microarrays.
> 
> I first read the files into an affybatch and than normalized them (RMA).
> After normalizing the arrays I extracted all non-drosophila miRNA from my
> expressionSet and ran the sam analysis only with the 186 dme-miRNA probes.
> 
> cl <- c(0,0,0,1,1,1) # "wt1", "wt2", "wt3", "mt1", "mt2", "mt3"
> sam.out <- sam(dme_rma,cl, var.equal=FALSE, B=100, include.zero=FALSE,
> gene.names=Probe_names, na.replace=FALSE, rand=123)
> 
> I run the sam analysis with different parameters and got *very* different
> results.
> 
> the first run was with the default parameters. Here I got  these results:
> SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances
> 
>  s0 = 0.3176  (The 55 % quantile of the s values.)
> 
>    Delta    p0 False Called   FDR cutlow cutup j2  j1
> 1   0.10 0.882 74.85     92 0.717 -0.130 2.135 91 186
> 2   0.11 0.882 74.85     92 0.717 -0.130 2.135 91 186
> 3   0.12 0.882 28.25     39 0.639 -0.575 2.135 38 186
> 4   0.13 0.882 24.35     38 0.565 -0.633 2.135 37 186
> ...
> 10  0.19 0.882 21.65     37 0.516 -0.672 2.135 36 186
> 11  0.20 0.882  12.4     24 0.456 -0.860 2.135 23 186
> 12  0.21 0.882  12.4     24 0.456 -0.860 2.135 23 186
> 13  0.22 0.882  12.4     24 0.456 -0.860 2.135 23 186
> 14  0.23 0.882  10.9     22 0.437 -0.905 2.135 21 186
> 15  0.24 0.882   0.4      1 0.353   -Inf 2.135  0 186
> ...
> 20  0.29 0.882   0.4      1 0.353   -Inf 2.135  0 186
> 21  0.30 0.882     0      0     0   -Inf   Inf  0 187
> 22  0.31 0.882     0      0     0   -Inf   Inf  0 187
> 
> I get here a very high FDR value for very few miRNAs.
> 
> For my second run I added the parameter R.fold = 1.2, to exclude all
> miRNAs
> under this value from the analysis.
> Here I get much better results with lower FDR values:
> 
>  Number of variables having a fold change >= 1.2 or <= 0.8333 : 90
> 
>  s0 = 0.0477  (The 0 % quantile of the s values.)
> 
>    Delta    p0 False Called    FDR cutlow cutup j2 j1
> 1    0.1 0.378  55.5     82 0.2557 -0.708 0.425 45 54
> 2    0.2 0.378 52.25     80 0.2467 -0.708 0.562 45 56
> 3    0.3 0.378  48.4     79 0.2314 -0.708 0.676 45 57
> 4    0.5 0.378  24.9     48 0.1960 -0.708 2.504 45 88
> 5    0.6 0.378  23.7     45 0.1990 -0.708   Inf 45 91
> 6    0.7 0.378 10.65     31 0.1298 -1.170   Inf 31 91
> 7    0.8 0.378  6.75     25 0.1020 -1.513   Inf 25 91
> 8    1.0 0.378  0.85      4 0.0803 -2.712   Inf  4 91
> 9    1.1 0.378  0.25      1 0.0944 -4.810   Inf  1 91
> 10   1.2 0.378     0      0      0   -Inf   Inf  0 91
> 
> I than ran a third sam with the parameter use.dm=FALSE. This gave me again
> different results with much lower FDR values:
> 
>  Number of variables having a fold change >= 1.2 or <= 0.8333 : 97
> 
>  s0 = 0.1488  (The 5 % quantile of the s values.)
> 
>    Delta    p0 False Called     FDR cutlow cutup j2 j1
> 1    0.1 0.009  53.5     81 0.00612 -0.541 0.545 48 65
> 2    0.2 0.009 50.65     80 0.00587 -0.541 0.616 48 66
> 3    0.3 0.009 26.85     48 0.00519 -0.541   Inf 48 98
> 4    0.4 0.009 26.85     48 0.00519 -0.541   Inf 48 98
> 5    0.5 0.009 26.85     48 0.00519 -0.541   Inf 48 98
> 6    0.6 0.009     0      0       0   -Inf   Inf  0 98
> 7    0.7 0.009     0      0       0   -Inf   Inf  0 98
> 8    0.8 0.009     0      0       0   -Inf   Inf  0 98
> 9    0.9 0.009     0      0       0   -Inf   Inf  0 98
> 10   1.0 0.009     0      0       0   -Inf   Inf  0 98
> 
> 
> As you can see my obvious problem.  Which is the better (right???) way of
> running this analysis.
> I think, to understand that, one needs to know how SAM works. I don't
> understand why I get such different results in the FDR between the three
> possibilities.
> 
> As far as I understand it, the lower the FDR value, the better. But how
> can
> it be, that I have in the third run over 80 miRNAs with such a low FDR,
> while I have only few DE miRNAs in the first run with a much higher FDR
> value?
> 
> I would appreciate any help,
> 
> thanks
> Assa

--