[BioC] siggenes parameters
Holger Schwender
holger.schw at gmx.de
Tue Jun 28 13:11:15 CEST 2011
Hi Assa,
siggenes is described in
http://cran.r-project.org/doc/Rnews/Rnews_2006-5.pdf
on pages 45ff, and with a bit more technical details in
https://eldorado.tu-dortmund.de/bitstream/2003/23306/1/diss_schwender.pdf
Section 6.3.2. Another possibility is to look into the code for sam, in particular into the functions d.stat and siggenes:::stats.cal.
The reason why the FDR is so low in the third example is the very, very low estimate for p0. Take a look at the number of falsely called genes (False) for about the same number of called genes. These numbers are close to each other. The FDR is estimated by
p0 * False/Called
so the FDR gets very low if p0 is very small.
I would thus not trust the results of the third analysis, as the p0 estimation has not worked properly here. Not really sure why, but the reason might be that too many null genes are removed when use.dm=FALSE.
Best,
Holger
-------- Original-Nachricht --------
> Datum: Tue, 28 Jun 2011 09:41:53 +0200
> Von: Assa Yeroslaviz <frymor at gmail.com>
> An: bioconductor <bioconductor at stat.math.ethz.ch>, holger.schw at gmx.de
> Betreff: siggenes parameters
> Hi everybody,
>
> I have a question about the behavior of siggenes in R (2.12) with
> different
> parameters.
> Maybe I need to understand better how siggenes calculated the
> differentially
> regulated genes, but I would like to ask you for your help in
> understanding
> it.
>
> Maybe you can also direct me to where I can find the answers for this
> problem(s)
>
> I am running a sam analysis of wild type vs. mutant in drosophila miRNA
> microarrays.
>
> I first read the files into an affybatch and than normalized them (RMA).
> After normalizing the arrays I extracted all non-drosophila miRNA from my
> expressionSet and ran the sam analysis only with the 186 dme-miRNA probes.
>
> cl <- c(0,0,0,1,1,1) # "wt1", "wt2", "wt3", "mt1", "mt2", "mt3"
> sam.out <- sam(dme_rma,cl, var.equal=FALSE, B=100, include.zero=FALSE,
> gene.names=Probe_names, na.replace=FALSE, rand=123)
>
> I run the sam analysis with different parameters and got *very* different
> results.
>
> the first run was with the default parameters. Here I got these results:
> SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances
>
> s0 = 0.3176 (The 55 % quantile of the s values.)
>
> Delta p0 False Called FDR cutlow cutup j2 j1
> 1 0.10 0.882 74.85 92 0.717 -0.130 2.135 91 186
> 2 0.11 0.882 74.85 92 0.717 -0.130 2.135 91 186
> 3 0.12 0.882 28.25 39 0.639 -0.575 2.135 38 186
> 4 0.13 0.882 24.35 38 0.565 -0.633 2.135 37 186
> ...
> 10 0.19 0.882 21.65 37 0.516 -0.672 2.135 36 186
> 11 0.20 0.882 12.4 24 0.456 -0.860 2.135 23 186
> 12 0.21 0.882 12.4 24 0.456 -0.860 2.135 23 186
> 13 0.22 0.882 12.4 24 0.456 -0.860 2.135 23 186
> 14 0.23 0.882 10.9 22 0.437 -0.905 2.135 21 186
> 15 0.24 0.882 0.4 1 0.353 -Inf 2.135 0 186
> ...
> 20 0.29 0.882 0.4 1 0.353 -Inf 2.135 0 186
> 21 0.30 0.882 0 0 0 -Inf Inf 0 187
> 22 0.31 0.882 0 0 0 -Inf Inf 0 187
>
> I get here a very high FDR value for very few miRNAs.
>
> For my second run I added the parameter R.fold = 1.2, to exclude all
> miRNAs
> under this value from the analysis.
> Here I get much better results with lower FDR values:
>
> Number of variables having a fold change >= 1.2 or <= 0.8333 : 90
>
> s0 = 0.0477 (The 0 % quantile of the s values.)
>
> Delta p0 False Called FDR cutlow cutup j2 j1
> 1 0.1 0.378 55.5 82 0.2557 -0.708 0.425 45 54
> 2 0.2 0.378 52.25 80 0.2467 -0.708 0.562 45 56
> 3 0.3 0.378 48.4 79 0.2314 -0.708 0.676 45 57
> 4 0.5 0.378 24.9 48 0.1960 -0.708 2.504 45 88
> 5 0.6 0.378 23.7 45 0.1990 -0.708 Inf 45 91
> 6 0.7 0.378 10.65 31 0.1298 -1.170 Inf 31 91
> 7 0.8 0.378 6.75 25 0.1020 -1.513 Inf 25 91
> 8 1.0 0.378 0.85 4 0.0803 -2.712 Inf 4 91
> 9 1.1 0.378 0.25 1 0.0944 -4.810 Inf 1 91
> 10 1.2 0.378 0 0 0 -Inf Inf 0 91
>
> I than ran a third sam with the parameter use.dm=FALSE. This gave me again
> different results with much lower FDR values:
>
> Number of variables having a fold change >= 1.2 or <= 0.8333 : 97
>
> s0 = 0.1488 (The 5 % quantile of the s values.)
>
> Delta p0 False Called FDR cutlow cutup j2 j1
> 1 0.1 0.009 53.5 81 0.00612 -0.541 0.545 48 65
> 2 0.2 0.009 50.65 80 0.00587 -0.541 0.616 48 66
> 3 0.3 0.009 26.85 48 0.00519 -0.541 Inf 48 98
> 4 0.4 0.009 26.85 48 0.00519 -0.541 Inf 48 98
> 5 0.5 0.009 26.85 48 0.00519 -0.541 Inf 48 98
> 6 0.6 0.009 0 0 0 -Inf Inf 0 98
> 7 0.7 0.009 0 0 0 -Inf Inf 0 98
> 8 0.8 0.009 0 0 0 -Inf Inf 0 98
> 9 0.9 0.009 0 0 0 -Inf Inf 0 98
> 10 1.0 0.009 0 0 0 -Inf Inf 0 98
>
>
> As you can see my obvious problem. Which is the better (right???) way of
> running this analysis.
> I think, to understand that, one needs to know how SAM works. I don't
> understand why I get such different results in the FDR between the three
> possibilities.
>
> As far as I understand it, the lower the FDR value, the better. But how
> can
> it be, that I have in the third run over 80 miRNAs with such a low FDR,
> while I have only few DE miRNAs in the first run with a much higher FDR
> value?
>
> I would appreciate any help,
>
> thanks
> Assa
--
More information about the Bioconductor
mailing list