[BioC] A few Q's on using DEXSeq with mucho data

Thu Mar 8 16:26:37 CET 2012

Hi Steve,

I had some thought for you that struck me right after I sent out this
email, specifically your "hunch" about using all the data to estimate
and fit the dispersion for each "bin".

Perhaps it's a good idea to use all the data to estimateDispersions?
(Still quite curious about that), but it seems that doing the
subsequent `fitDispersionFunction` step might not be a great idea to
run against all of the data at once.

I say this because by looking through the code, it seems like the
dispersion is fit for each exon/bin by the (normalized) mean
expression of that bin across the entire dataset. So, if the
expression of the gene (and exon) is quite variable across all
conditions we have data for, when we go back and try to test
differential exon usage for a specific condition 1 vs. condition 2
case, I think we'd rather fit the dispersion for the mean expression
of that bin for the two conditions under test.

Isn't that right?

Sorry for the spam, just trying to reason a bit about this stuff ...

Thanks for any help,
-steve

On Thu, Mar 8, 2012 at 10:08 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> Imagine, if you will, that I have data for 7-8 different conditions
> and I'd like to use DEXSeq to test for differential exon usage.
>
> Is it best to create an ExonCountSet with all of my data (with an
> appropriate design matrix that identifies which samples belong to
> which condition) and do the estimateDispersion step with that?
>
> My guess is yes, but I wanted to double check -- am I in danger of
> maybe flagging some genes/exons as non-testable if, for example, it is
> only expressed in 2 out of 8 conditions?
>
> Assuming I should use all of my data to `estimateDisperion`s, what if
> I only have one replicate for one of the 8 conditions, is it best to
> remove it? I'm guessing it wouldn't provide any meaning information
> for dispersion estimation since it's only one observation for that
> condition.
>
> Lastly, when testing for differences in exon usage, assuming I've been
> using the data from all of my experiments up to this point, I don't
> see a way to specify which experiments I want to specifically test.
>
> If I run "the normal" DEXSeq analysis on my data, I end up with a
> DEUresultTable that looks like it has log2fold(x/y) values for all
> experiments against just one y. There is only one pvalue/padjust
> column, so I'm not sure what comparison that is for.
>
> Thanks for any help,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact