[BioC] [Bioc-sig-seq] EdgeR questions in analyzing 454 data-about prior.n, TMM, and p_value

Mark Robinson mrobinson at wehi.EDU.AU
Tue Oct 19 00:43:10 CEST 2010


Hi Ying.

Some comments below.

On 2010-10-18, at 10:22 PM, Ying Ye wrote:

> Dear edgeR users and developers,
> 
> I have few questions about edgeR when recently I use it for 454  
> pyrosequencing data:
> 
> 1. prior.n
>     According to users' manual, we may not use too low prior.n in  
> moderated tagwise dispersion approach. But in my dataset, there are  
> more than 15 samples in each comparison group and the freedom is  
> larger than 30. prior.n <- estimateSmoothing(d) gives 0.0005329. So I  
> am wondering if I could use 0.0005329 since I have rather big number  
> of samples in each group. Or I should adjust prior.n into 10 according  
> to the manual's suggestion.

Well, its hard to give a prescription for prior.n for all datasets.  Since you have so many degrees of freedom, you shouldn't need prior.n as high as 10.  You might try something lower, say 1-3.


> 2. TMM
>     I am not sure if this is also applicable to 454 microbiota data.  
> I suppose I should do TMM normalization as well since the  
> normalization factors from my samples have a big variation (f is from  
> 0.41 to 4.58). Is that right?

I must admit that I'm not intimately aware of all the nuances of microbiota data, but I will say that those factors you mention above are generally lower/higher than we see in RNA-seq data.  I'd say its probably best to look at some "smear" plots -- through maPlot() for example -- to assess whether the TMM normalization is appropriately capturing shifts due to composition or the like.

As always for exploratory analysis, it would be good to look multidimension scaling plots -- see plotMDS.dge().  There is no substitute for looking at your data.


> 3. p_value
>     According to your experience, is it reasonable and reliable to  
> use p_value < 0.05 as significance criteria? or only <0.01 can be  
> reliable.

First off, you'll probably want to do some multiple testing correction, which can be done through the topTags() function.  As to where to set the threshold on significance, that is a matter of your false discovery tolerance ... the status quo is 5%, but you may want to be more or less stringent.

Hope that helps.

Mark



> I am a new users in this package and hope you may give some  
> suggestions. Many thanks!
> 
> Ying Ye
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------






______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list