[BioC] prior df estimation of estimateDisp function

Gordon K Smyth smyth at wehi.EDU.AU
Mon Feb 3 23:35:52 CET 2014


Dear Koen,

The results you are seeing are as one would expect for a technical 
datasets.  Remember that the dispersion measures biological variation. 
The replicates here are just re-sequencing (and perhaps some 
re-preparation) of exactly the same RNA samples, so there is no biological 
variation apart from slight inaccuracies in the preparation of the 
samples.  Hence the tagwise dispersions will be very small (or even zero) 
and nearly equal. Hence the prior df will be estimated to be large or even 
infinity.

Best wishes
Gordon

> Date: Sun, 2 Feb 2014 12:16:05 +0100
> From: Koen Van den Berge <koen.vdberge at gmail.com>
> To: Bioconductor mailing list <bioconductor at r-project.org>
> Subject: [BioC] prior df estimation of estimateDisp function
>
> Dear Bioconductors,
>
> In evaluating several RNA-seq analysis techniques, I decided to analyse 
> the same SEQC spiked-in dataset as mentioned in the limma-Voom paper 
> (Law et al. 2014, Genome Biology).

> This dataset contains 92 genes, 69 of which are spiked-in at different 
> concentrations. As in the paper, I copied the 23 non DE genes twice, 
> providing me a dataset with a total of 138 genes, half of which are 
> (non) DE.

> The authors use a filter on the dataset that requires a cpm higher than 
> 1 in at least 4 libraries.

> First I decided to analyse the data using EdgeR:
>
> spiked <- read.csv(filename)
>
> #Copy non DE genes
> copies <- spiked[id,]
> spiked2 <- rbind(spiked,copies)
> spiked3 <- rbind(spiked2,copies)
>
> #Design matrix
> libraries <- factor(rep(c("A" , "B" , "C" , "D"),each=4))
> design <- model.matrix(~ libraries)
>
> #make DGElist and normalize
> y <- DGEList(spiked3[,-1], group= libraries)
> y$samples$norm.factors <- normFactorsAll
>
> #estimate dispersion
> estimateDisip(y , design)
>
> In doing so, the estimateDisp() function provided me an estimate of prior df of 58.90.
>
> In assessing the effect of the filtering process, I followed the same 
> steps as above, however not applying the filtering process. The 
> estimateDisp() function then provided me an estimate of prior df of 
> Infinity, suggesting the trended dispersion estimation should be used. 
> Why is this happening? Is the ML dispersion estimation sensitive to 
> these low counts, resulting in an estimate of infinity for the prior 
> degrees of freedom?

> I can not really grasp this result, so any suggestions or help in this 
> topic are very welcome.
>
> Sincerely,
> Koen.

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list