[BioC] DESeq and number of replicates required for RNA-Seq
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Mon Jun 14 18:57:01 CEST 2010
Thanks for the reply.
The issue isn't necessarily low expressing genes, but perhaps high expressing genes with a small (ish) fold change. DESeq seems to only report as significant differences that are high fold changes.
Contrast this to limma for microarrays, where small fold changes can be reported as significant.
For whatever reason, the transcriptomic community have become fixated on "two-fold" as some kind of standard cut-off. Now, I'm not fixated on that, but the example in DESeq reports 428 significant genes with an estimated fold change at FDR 5%, however, NONE of these are in the range -2 : 2. The minimum positive logFC is 2.18 (4.5 fold up-regulation), and the maximum negative logFC is 2.49 (5.65 fold down-regulation).
So what I am concerned about is finding genes, either highly or lowly expressed, that are differing by a small fold change - say two-fold.
From: Naomi Altman [naomi at stat.psu.edu]
Sent: 14 June 2010 17:42
To: michael watson (IAH-C); bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] DESeq and number of replicates required for RNA-Seq
The issue is a mix of expression level and sample size. For count
data, the power is higher when the expression is higher. Also, the
p-values are discrete - the lower the total read count, the fewer
values are possible, which messes up the FDR estimation.
Of course, understanding the problem does not necessarily suggest a
solution. But sample sizes will need to be large (or you need to
sequence very deeply) if you want to detect differential expression
in low expressing genes.
At 09:45 AM 6/14/2010, michael watson (IAH-C) wrote:
>This follows on slightly from my experimental design thread.
>Having worked through the vignette for DESeq, it seems to work
>well. However, for the TagSeqExample.tab data set, when using an
>FDR cut off of 0.05, what we see is that we only find differential
>expression for large fold changes - an average of log2 fold change
>of 5 for up-regulated, and log2 fold change of -5 for
>down-regulated. There are very few significant results that even go
>as far down as 2 or -2 - which is still a 4-fold change.
>So, the question is, how many replicates must we have to get more
>sensitive results? Say down to log2FC of 1? (two-fold up or down regulated)?
>I can calculate this by using DESeq's own estimates of variance to
>approximate replicates for T and N in the example data, and keep
>going until my significant results start to hit a logFC of 1, but I
>wanted to know if anyone else had done this yet?
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>Search the archives:
Naomi S. Altman 814-865-3791 (voice)
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor