[BioC] tagwise parameters for negative binomial distribution in edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Thu Mar 20 01:04:19 CET 2014


Dear Davide,

Do you want to identify tags (genes) with dispersion values that are so 
high (relative to other genes with similar count sizes) that they should 
be considered outliers?

The easiest way to do this is to use

   d <- estimateDisp(d, design, robust=TRUE)

and then look at the output values for prior.df:

   summary(d$prior.df)

Any tag with a small prior.df is considered an outlier.  You can sort tags 
by their prior.df values to select the most significant outliers.

Note that the methodology used by the estimateDisp() robust procedure is 
more complicated than simply using NB probabilities, because one has to 
take into acccount the genome-wide distribution of the dispersion values 
as well as accounting for the fact that the fitted values (p) have been 
estimated from the same data.  The methodology is mostly explained in:

  http://www.statsci.org/smyth/pubs/edgeRChapterPreprint.pdf
  http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf

Best wishes
Gordon

> From: Davide Cittaro <cittaro.davide at hsr.it>
> To: "bioconductor at r-project.org list" <bioconductor at r-project.org>
> Subject: [BioC] tagwise parameters for negative binomial distribution
> 	in	edgeR
>
> Dear list,

> I have a DGElist object in edgeR, already processed with 
> calcNormFactors, estimateCommonDispersion and estimateTagWiseDispersion. 
> Now, I would like to identify tagwise outliers in my data, I thought I 
> could estimate NB distribution for each tag. Given that a NB is defined 
> by two parameters (r and p), I assume that r = 1/x$tagwise.dispersion, 
> how can I get tagwise p from DGEList dataframe?

> Thanks
>
> d

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list