[BioC] tagwise parameters for negative binomial distribution in edgeR

Fri Mar 21 18:27:18 CET 2014

Hi Ina,

I don't think voom has any special consideration for observation 
outliers, but limma's 'eBayes' function has a 'robust' argument which I 
believe has the same effect as the corresponding argument in edgeR's 
'estimateDisp', i.e. dealing with outlier tags that have abnormally 
high (or low) variance.

-Ryan

On Fri 21 Mar 2014 08:48:23 AM PDT, Ina Hoeschele wrote:
> Hi Mark,
>    how would the presence of observation outliers potentially causing dispersion outliers be handled in voom?
> Many thanks, Ina
>
>
> ----- Original Message -----
> From: "Mark Robinson" <mark.robinson at imls.uzh.ch>
> To: "Davide Cittaro" <cittaro.davide at hsr.it>
> Cc: "Bioconductor mailing list" <bioconductor at r-project.org>, "xiaobei.zhou at uzh.ch Zhou" <xiaobei.zhou at uzh.ch>, "Gordon Smyth" <smyth at wehi.edu.au>
> Sent: Thursday, March 20, 2014 9:58:05 AM
> Subject: Re: [BioC] tagwise parameters for negative binomial distribution in	edgeR
>
> Hi Davide,
>
> Just to add another option, there is also estimateGLMRobustDisp(), which is a wrapper for an iteratively re-weighted scheme that down weights outliers -- effectively using a constant prior degrees of freedom.  This was developed independently of estimateDisp() from a different perspective.  But, what this gives is a matrix of weights (so, one for each observation, not just for each tag) where identified outliers should exhibit lower weights.  So, you could use this to identify outliers observation-wise or tag-wise (e.g. take column sum of weights).  You'd want >= 3 replicates per condition for this one though.
>
> In code, you could do something like:
>
> d <- estimateGLMRobustDisp(d, design)
> summary(d$weights)
>
> More details can be found at:
>
> http://arxiv.org/abs/1312.3382
>
> http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/supplement.pdf
> (in particular, Supplementary Figure 8, which shows ROC curves for ability to separate [simulated] outliers by weights/residuals and yet another option is DESeq2's Cook's [observation-wise or max-by-tag] distance; we don't have a curve for estimateDisp!)
>
>
> Best regards, Mark
>
>
> ----------
> Prof. Dr. Mark Robinson
> Bioinformatics, Institute of Molecular Life Sciences
> University of Zurich
> http://ow.ly/riRea
>
>
>
>
>
>
> On 20.03.2014, at 01:04, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Davide,
>>
>> Do you want to identify tags (genes) with dispersion values that are so high (relative to other genes with similar count sizes) that they should be considered outliers?
>>
>> The easiest way to do this is to use
>>
>>   d <- estimateDisp(d, design, robust=TRUE)
>>
>> and then look at the output values for prior.df:
>>
>>   summary(d$prior.df)
>>
>> Any tag with a small prior.df is considered an outlier.  You can sort tags by their prior.df values to select the most significant outliers.
>>
>> Note that the methodology used by the estimateDisp() robust procedure is more complicated than simply using NB probabilities, because one has to take into acccount the genome-wide distribution of the dispersion values as well as accounting for the fact that the fitted values (p) have been estimated from the same data.  The methodology is mostly explained in:
>>
>> http://www.statsci.org/smyth/pubs/edgeRChapterPreprint.pdf
>> http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf
>>
>> Best wishes
>> Gordon
>>
>>> From: Davide Cittaro <cittaro.davide at hsr.it>
>>> To: "bioconductor at r-project.org list" <bioconductor at r-project.org>
>>> Subject: [BioC] tagwise parameters for negative binomial distribution
>>> 	in	edgeR
>>>
>>> Dear list,
>>
>>> I have a DGElist object in edgeR, already processed with calcNormFactors, estimateCommonDispersion and estimateTagWiseDispersion. Now, I would like to identify tagwise outliers in my data, I thought I could estimate NB distribution for each tag. Given that a NB is defined by two parameters (r and p), I assume that r = 1/x$tagwise.dispersion, how can I get tagwise p from DGEList dataframe?
>>
>>> Thanks
>>>
>>> d
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intend...{{dropped:4}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor