[BioC] edgeR on microRNA data

Gordon K Smyth smyth at wehi.EDU.AU
Sat Oct 1 10:09:53 CEST 2011


Dear Helena,

Compared with mRNA-Seq, you have an unusually small number of transcripts 
but a relatively large number of biological replicates.  This suggests 
that you should use a relative small value for prior.n but a relatively 
large value for prop.used.  I am concerned that you have decreased 
prop.used its default value of 0.3.  I would tend to increase this rather 
than decrease it.

On the other hand, you have increased prior.n from its default value, 
which for your data would be a little over 0.5.  Is this simply because it 
gave better looking results?  Anyway, increasing prior.n does not result 
in overfitting.  The risk with larger prior.n is simply that it may start 
to return differentially expressed miRs that are increased or decreased in 
only a few of the samples, rather than consistently for all samples in a 
group.

Your experience with prior.n is unintuitive to me.  Generally speaking, 
choosing prior.n small means that each miR gets to set its own dispersion, 
so that miR with large variance will not appear in the topTag list.  When 
you say "variance outliers", do you mean large or small variance?

Since your minimum group sample size is 10, I would have required miRs to 
satisfy your cpm requirement in >= 10 samples rather than 5.

Best wishes
Gordon

> Date: Thu, 29 Sep 2011 05:25:14 +0000
> From: Helena Persson <helena.persson at ki.se>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] edgeR on microRNA data
>
> Hi,

> I would be grateful for some input on using edgeR for small RNA sequence 
> data. I have been testing edgeR on a set of miRNA data (3 groups with 
> n=10, 15 and 15). After removing genes that are not expressed at >= 0.2 
> cpm in >= 5 samples I have ~600 rows left. I tried calculating the 
> tagwise dispersion estimate with:
>
> cds1 <- estimateTagwiseDisp(cds1, prior.n=2, trend=TRUE, prop.used=0.1, 
> grid=FALSE)
>
> Increasing the prior to e.g. 10 gives more differentially expressed 
> genes that do not look bad. Decreasing the prior to 0 leaves me with 
> extremely few differentially expressed genes that are mainly variance 
> outliers. I guess that miRNA data is likely to behave differently from 
> mRNA data since there are so few genes (but still a very large dynamic 
> range). Is it possible that I am over-fitting the estimate? Would you 
> recommend changing any other parameters?
>
> Best regards,
> Helena
> _________________________________
>
> Helena Persson, PhD
>
> Karolinska Institutet
> Dept of Biosciences and Nutrition
> Hälsovägen 7-9
> SE-141 83 Huddinge
> Sweden
>
> Helena.Persson at ki.se
>
> tel. +46-(0)8-52481058

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list