[BioC] tagwise parameters for negative binomial distribution in edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Fri Mar 21 01:47:28 CET 2014

On Thu, 20 Mar 2014, Davide Cittaro wrote:

> On 20/mar/2014, at 01:04, Gordon K Smyth <smyth at wehi.EDU.AU> wrote:
>> Do you want to identify tags (genes) with dispersion values that are so
>> high (relative to other genes with similar count sizes) that they should
>> be considered outliers?
> Mmm, actually I would like to identify the sample that is an outlier for 
> a specific gene, that's why I thought I could focus on tagwise 
> distribution.

See Mark Robinson's post.

It depends on your purpose however.  Do you want to downweight/ignore 
outliers, or do you want to identify them because they are interesting?

>> The easiest way to do this is to use
>>   d <- estimateDisp(d, design, robust=TRUE)
>> and then look at the output values for prior.df:
>>   summary(d$prior.df)
>> Any tag with a small prior.df is considered an outlier.  You can sort tags
>> by their prior.df values to select the most significant outliers.
> Does this identify a tag that is an outlier over all samples?

Basically yes.  We distinguish dispersion outliers and observation 
outliers.  An observation outlier is an individual count that is an 
outlier (relative to other counts for the same gene).  A dispersion 
outlier is a gene that shows much more variability between replicates than 
other genes at the same cpm level.  A dispersion outlier may arise from 
one or more observation outliers, but not necessarily.  It could also 
arise from systematically larger variability.


The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list