[BioC] RNASeq, differential expression between group, and large variance within groups

Gordon K Smyth smyth at wehi.EDU.AU
Wed Mar 2 04:50:39 CET 2011


Dear Simon and Laurant,

I can't agree with Simon's statement that edgeR does no better than DESeq 
at downweighting tags with extreme variances, or that this has to do with 
the number of replicates.  While extreme cases like the example that 
Laurant mentions may need special intervention, edgeR was specifically 
designed to downweight highly variable tags, and this is just as effective 
with few replicates as for many.

Let's simulate a dataset with Laurant's tag as the first one:

   library(edgeR)
   y <- matrix(rpois(9999*6,lambda=50),9999,6)
   y <- rbind(c(0,0,0,92207,0,0),y)
   rownames(y) <- 1:10000
   d <- DGEList(counts=y,group=factor(c(1,1,1,2,2,2)))
   d2 <- estimateTagwiseDisp(d,prior.n=1)
   et <- exactTest(d2,common.disp=FALSE)
   topTags(et)

This analysis finds no tag to be differentially expressed, just as you 
would want if you view the large count for tag1 to be an outlier.

(Here I have chosen prior.n to be lower than the default.  The default 
value prior.n=10 does result in tag1 being identified as differentially 
expressed.  It is hard to give universal guidelines for how to best to 
choose prior.n).

Best wishes
Gordon

------ ORIGINAL MESSAGE --------
[Bioc-sig-seq] RNASeq, differential expression between group, and large variance within groups
Simon Anders anders at embl.de
Mon Feb 21 20:34:00 CET 2011

Dear Laurant

On 02/21/2011 03:36 PM, Laurent Gautier wrote:

> We are looking at tag-based RNASeq data, and after running popular 
> packages for finding differential expression (edgeR, and DEGseq) we were 
> looking that the actual counts for the significant ones.
>
> We are observing a somewhat extreme variance within each group for those 
> (say one sample with high count for gene X while others have zero 
> count).
>
> For example, gene X flagged as differentially expressed has the
> following counts (adjusted p-value with DGESeq is 9.401479e-10):
> 0 grp_A
> 0 grp_A
> 0 grp_A
> 92207 grp_B
> 0 grp_B
> 0 grp_B
>
> The underlying binomial is obviously not like the almost-Gaussian
> assumed in microarrays/t-test-like approaches, but this kind of outcome
> is somehow intriguing me. Do people here have experience to share
> regarding how well such gene hold through the qPCR verification step ?

I have seen such genes as well in my data sets, and I am in fact worried
that DESeq does not do a too great job handling them.

[...]

In most data sets these are only very few genes, but still, it is not a
fully satisfactory state of affair. I recently tested how edgeR deals with
the issue and found that it does not do a much better job in handling such
genes unless you have a large number of replicates.

[...]

Cheers
   Simon

+---
| Dr. Simon Anders, Dipl.-Phys.
| European Molecular Biology Laboratory (EMBL), Heidelberg
| office phone +49-6221-387-8632
| preferred (permanent) e-mail: sanders at fs.tum.de


---------------------------------------------
Professor Gordon K Smyth,
NHMRC Senior Research Fellow,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list