[BioC] DEXSeq all p-values are 1

Thu Dec 13 18:11:11 CET 2012

Hi,

On Thu, Dec 13, 2012 at 11:59 AM, Philip Jonsson
<philip.jonsson at gmail.com> wrote:
> I realize that it can be hard to nail down a source/source of this
> (possible) error.
>
> So, the data is four single-end runs - two treated and two control-treated
> samples. The design I set up looks like this:
>                                              condition replicate
> ./treatedsample1.txt          Treatment         1
> ./treatedsample2.txt          Treatment         2
> ./controlsample1.txt           Vehicle             1
> ./controlsample2.txt           Vehicle              2
>
> After using read.HTSeqCounts I use estimateSizeFactors, followed
> by estimateDispersions. For estimateDispersions I get the same results
> whether I set minCount to 0 or 10 and whether I set maxExon to default 70
> or something higher, like 1000. I have not tried to change initialGuess,
> since I don't really grasp what it means. I also leave formula unchanged.
> Afterwards I use fitDispersionFunction and testForDEU with default
> parameters.
>
> My ExonCountSet was created with the package's Python scripts. I used a GFF
> file from UCSC which is of the same genome build as used for the read
> mapping.
>
> There is variance across my replicates, but it doesn't seem too extreme. I
> could, as mentioned, call differentially expressed genes with DESeq without
> problems. Do the values for dispBeforeSharing, dispFitCoefs, or dispFitted
> tell me anything about this?

There is a `plotDispEsts` function which is defined in the DEXSeq
vignette (not sure why it's not included & exported in the package)
which is useful to see how your dispersions look.

This is the function:

plotDispEsts = function( cds, ymin, linecol="#ff000080",
  xlab = "mean of normalized counts", ylab = "dispersion",
  log = "xy", cex = 0.45, ... )
{
  px = rowMeans( counts( cds, normalized=TRUE ) )
  sel = (px>0)
  px = px[sel]

  py = fData(cds)$dispBeforeSharing[sel]
  if(missing(ymin))
      ymin = 10^floor(log10(min(py[py>0], na.rm=TRUE))-0.1)

  plot(px, pmax(py, ymin), xlab=xlab, ylab=ylab,
    log=log, pch=ifelse(py<ymin, 6, 16), cex=cex, ... )
  xg = 10^seq( -.5, 5, length.out=100 )
  fun = function(x) { cds at dispFitCoefs[1] + cds at dispFitCoefs[2] / x }
  lines( xg, fun(xg), col=linecol, lwd=4)
}

After you call estimateDispersions, you can call that on your
ExonCountSet to see how your dispersions "look".

That having been said, the description of the steps you are taking in
your analysis sound quite right, so it might be a (1) data quality
problem (but you say they look fine); or (2) no real differential exon
usage ;-)

Perhaps Simon and crew can chime in with some tips they might have
from experience w/ other data, since I guess they're likely to be the
most seasoned DEXSeq-ers.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact