[BioC] DEXSeq estimating dispersions and multiple conditions

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Jan 28 22:25:03 CET 2013


Hi,

I'll try to offer some suggestions until the authors have time to respond.

Hopefully I don't lead you too far off the correct path:

On Mon, Jan 28, 2013 at 11:31 AM, Natasha Sahgal <nsahgal at well.ox.ac.uk> wrote:
> Dear List,
>
> I am using DEXSeq to test for differential exon usage on a dataset with 3 samples (2 belong to a treatment group and 1 to a control group) and have 2 main questions.
>
> 1) Unbalanced design where one out of two groups has no replicates
>
> Name    condition
> a1      control
> b1      treatment
> b2      treatment
>
> Now, I have read in a couple of other threads that there is no point in running DEXSeq when there are no replicates (unlike in case of DESeq - which I have already run for these). However, the researcher still wants me to try it out.
>
> Thus, I was wondering if there in an unbalanced design, is it still possible to run DEXSeq and would estimateDispersions still work? When I run this function, I don't get errors but when I run fitDispersionFunction is doesn't like it.
> ########
>> ecs <- estimateDispersions(ecs, nCores=8)
> Estimating Cox-Reid exon dispersion estimates using 8 cores. (Progress report: one dot per 100 genes)
> ................................................................................................................................................................>
>>
>>
>> ecs <- fitDispersionFunction(ecs)
> Error in fitDispersionFunction(ecs) :
>   no CR dispersion estimations found, please first call estimateDispersions function
>>
>> head(fData(ecs))
>                               geneID exonID testable dispBeforeSharing
> ENSG00000000003:E001 ENSG00000000003   E001     TRUE                NA
> ENSG00000000003:E002 ENSG00000000003   E002     TRUE                NA

[cut]

The portion of the fData table you pasted look to be all NA for the
`dispBeforeSharing` estimate, which (if true) would suggest that
something is going wrong during the estimateDisperion step.

After you run `ecs <- estimateDispersions(ecs, nCores=8)`, what
fraction of your exons has an estimated dispersion? eg:

R> sum(!is.na(fData(ecs)$dispBeforeSharing)) / nrow(fData(ecs))

Hopefully it's greater than 0?

A quick check seems to suggest that one could do what you need to (2
reps in condA and 1) using DEXSeq as is. Try it by creating a similar
unbalanced data set from the pasilla package using only the
single-read data:

R> library(DEXSeq)
R> data("pasillaExons", package="pasilla")
R> p <- pasillaExons[, pData(pasillaExons)$type == 'single-read']
R> pData(p) <- droplevels(pData(p))
R> pData(p)[, 1:3]
              sizeFactor condition        type
treated1fb  |         NA   treated single-read
untreated1fb|         NA untreated single-read
untreated2fb|         NA untreated single-read

R> p <- estimateSizeFactors(p)
R> p <- estimateDispersions(p)
R> p <- fitDispersionFunction(p)
R> sum(!is.na(fData(p)$dispBeforeSharing)) / nrow(fData(p))
[1] 0.7248996


> Also, I read in a thread somewhere where Simon suggested to set fData(ecs)$dispersion <- .1, if it is necessary for a 1 vs 1, but does not fully recommend it. Is this what I need to do?
>
>
> 2) How does one work with multiple groups (or conditions)?
> In the vignette, there is nothing specified and neither can I see anything in the testforDEU function. Unlike DESeq (or edgeR) where one can specify. In DESeq: for e.g. if there are 3 groups: Control, Treatment & Mutant:
>
> res1 <- nbinomTest(cds, "Control", "Treatment")
> and/or
> res2 <- nbinomTest(cds, "Control", "Mutant")

If you want to explicitly only test for two conditions (and ignore the
other ones), I believe that, currently, your only recourse is to
subset your ExonCountSet to include only the two conditions you want
to test. Make sure you do the "droplevels" trick on the pData like I
did above (this will remove factors in your `condition` column that
represent the conditions you've removed). After that, you can do the
testForDEU.

btw, the above was run on R 2.15.2 and DEXSeq 1.4.0

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list