[BioC] DEXSeq estimating dispersions and multiple conditions

Natasha Sahgal nsahgal at well.ox.ac.uk
Tue Jan 29 13:05:06 CET 2013


Hello again,

I think I found the reason for that strange behaviour.

When Steve mentioned the drop.levels case whilst subsetting data, I decided to check my code from the beginning.
The design for my ecs object accidentally had an additional level. On correcting that and rerunning everything again it appears to be fine.

sum(!is.na(fData(ecs)$dispBeforeSharing)) / nrow(fData(ecs)) 
[1] 0.4693243   


Best,
Natasha


-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Natasha Sahgal
Sent: 29 January 2013 10:34
To: Steve Lianoglou
Cc: bioconductor at r-project.org
Subject: Re: [BioC] DEXSeq estimating dispersions and multiple conditions

Hi Steve,

Thanks for the rather quick response.

Based on what you said, I did check for what fraction of exons have an estimated dispersion, unfortunately it is zero! 
> sum(!is.na(fData(ecs)$dispBeforeSharing)) / nrow(fData(ecs))
[1] 0

Also tried the subsetting the pasilla dataset for a similar design as you mentioned, and yes it did appear to work.


Many Thanks,
Natasha

-----Original Message-----
From: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com]
Sent: 28 January 2013 21:25
To: Natasha Sahgal
Cc: bioconductor at r-project.org
Subject: Re: [BioC] DEXSeq estimating dispersions and multiple conditions

Hi,

I'll try to offer some suggestions until the authors have time to respond.

Hopefully I don't lead you too far off the correct path:

On Mon, Jan 28, 2013 at 11:31 AM, Natasha Sahgal <nsahgal at well.ox.ac.uk> wrote:
> Dear List,
>
> I am using DEXSeq to test for differential exon usage on a dataset with 3 samples (2 belong to a treatment group and 1 to a control group) and have 2 main questions.
>
> 1) Unbalanced design where one out of two groups has no replicates
>
> Name    condition
> a1      control
> b1      treatment
> b2      treatment
>
> Now, I have read in a couple of other threads that there is no point in running DEXSeq when there are no replicates (unlike in case of DESeq - which I have already run for these). However, the researcher still wants me to try it out.
>
> Thus, I was wondering if there in an unbalanced design, is it still possible to run DEXSeq and would estimateDispersions still work? When I run this function, I don't get errors but when I run fitDispersionFunction is doesn't like it.
> ########
>> ecs <- estimateDispersions(ecs, nCores=8)
> Estimating Cox-Reid exon dispersion estimates using 8 cores. (Progress
> report: one dot per 100 genes)
> ......................................................................
> ......................................................................
> ....................>
>>
>>
>> ecs <- fitDispersionFunction(ecs)
> Error in fitDispersionFunction(ecs) :
>   no CR dispersion estimations found, please first call 
> estimateDispersions function
>>
>> head(fData(ecs))
>                               geneID exonID testable dispBeforeSharing
> ENSG00000000003:E001 ENSG00000000003   E001     TRUE                NA
> ENSG00000000003:E002 ENSG00000000003   E002     TRUE                NA

[cut]

The portion of the fData table you pasted look to be all NA for the `dispBeforeSharing` estimate, which (if true) would suggest that something is going wrong during the estimateDisperion step.

After you run `ecs <- estimateDispersions(ecs, nCores=8)`, what fraction of your exons has an estimated dispersion? eg:

R> sum(!is.na(fData(ecs)$dispBeforeSharing)) / nrow(fData(ecs))

Hopefully it's greater than 0?

A quick check seems to suggest that one could do what you need to (2 reps in condA and 1) using DEXSeq as is. Try it by creating a similar unbalanced data set from the pasilla package using only the single-read data:

R> library(DEXSeq)
R> data("pasillaExons", package="pasilla") p <- pasillaExons[, 
R> pData(pasillaExons)$type == 'single-read']
R> pData(p) <- droplevels(pData(p))
R> pData(p)[, 1:3]
              sizeFactor condition        type
treated1fb  |         NA   treated single-read
untreated1fb|         NA untreated single-read
untreated2fb|         NA untreated single-read

R> p <- estimateSizeFactors(p)
R> p <- estimateDispersions(p)
R> p <- fitDispersionFunction(p)
R> sum(!is.na(fData(p)$dispBeforeSharing)) / nrow(fData(p))
[1] 0.7248996


> Also, I read in a thread somewhere where Simon suggested to set fData(ecs)$dispersion <- .1, if it is necessary for a 1 vs 1, but does not fully recommend it. Is this what I need to do?
>
>
> 2) How does one work with multiple groups (or conditions)?
> In the vignette, there is nothing specified and neither can I see anything in the testforDEU function. Unlike DESeq (or edgeR) where one can specify. In DESeq: for e.g. if there are 3 groups: Control, Treatment & Mutant:
>
> res1 <- nbinomTest(cds, "Control", "Treatment") and/or
> res2 <- nbinomTest(cds, "Control", "Mutant")

If you want to explicitly only test for two conditions (and ignore the other ones), I believe that, currently, your only recourse is to subset your ExonCountSet to include only the two conditions you want to test. Make sure you do the "droplevels" trick on the pData like I did above (this will remove factors in your `condition` column that represent the conditions you've removed). After that, you can do the testForDEU.

btw, the above was run on R 2.15.2 and DEXSeq 1.4.0

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology  | Memorial Sloan-Kettering Cancer Center  | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list