[BioC] DEXSeq: Problems trying to reproduce the example of the vignette

Wed Jun 12 15:30:33 CEST 2013

Hi!

As the subject says, I've got some problems with DEXSeq. For the
moment, I am just trying to reproduce the example of the Vignette. The
first error that occurrs is when I try to plot the per-gene disperion
estimates using plotDispEsts:

> library("DEXSeq")
> library("pasilla")
> library("parallel")
> data("pasillaExons", package="pasilla")

> pasillaExons <- estimateSizeFactors(pasillaExons)
> pasillaExons <- estimateDispersions(pasillaExons, nCores=3)
Estimating Cox-Reid exon dispersion estimates using 3 cores. (Progress
report: one dot per 100 genes)
> pasillaExons <- fitDispersionFunction(pasillaExons)

> plotDispEsts(pasillaExons)
Error: is(cds, "CountDataSet") is not TRUE
> traceback()
4: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"),
       ch), call. = FALSE, domain = NA)
3: stopifnot(is(cds, "CountDataSet"))
2: fitInfo(cds, name = name)
1: plotDispEsts(pasillaExons)

What am I doing wrong here? In general I work with the vignette which
comes with the package (last revision 2013-02-27), but I found a piece
of code in an older version which seems to work however:

> meanvalues <- rowMeans(counts(pasillaExons))
> plot(meanvalues, fData(pasillaExons)$dispBeforeSharing, log = "xy", main = "mean vs CR dispersion")
> x <- 0.01:max(meanvalues)
> y <- pasillaExons at dispFitCoefs[1] + pasillaExons at dispFitCoefs[2]/x
> lines(x, y, col = "red")

The second problem I have got is the MA plot after testing for
differential exon usage:

> pasillaExons <- testForDEU(pasillaExons, nCores=3)
Testing for differential exon usage using 3 cores. (Progress report:
one dot per 100 genes)
> pasillaExons <- estimatelog2FoldChanges(pasillaExons)
> res <- DEUresultTable(pasillaExons)
> plotMA(with(res, data.frame(baseMean=meanBase, log2FoldChange=log2fold, padj=padjust)), ylim=c(-4,4), cex=0.8)

This works fine. But I wanted to change the threshold for the color
highlighting to "padjust < 0.05" (although this wouldn't change
anything in this case, I just wanted to try). So I did the following:

> plotMA(with(res, data.frame(baseMean=meanBase, log2FoldChange=log2fold, padj=padjust)), ylim=c(-4,4), cex=0.8, col=ifelse(res$padj >= 0.05, "gray32", "red3"))

But whenever I use the col parameter with whatever value (also with
simply one color), the output is different and more data points than
before get displayed. I do not really understand this effect.

I worked with R 2.15 and the newest version of DEXSeq but updated
everything yesterday (R and all packages, see sessionInfo below) to
see if this changes something, but it didn't. Of course I can just
copy the function from the source and define my own one changing the
threshold, but I'd prefer to understand what's going on here.

An another small question regarding this plot: Reading the vignette I
could not find out what the triangles mean which sometimes appear. Do
they indicate outliers (lying outside the plotting area), or is this
something completely different?

And my last question: I read in
http://seqanswers.com/forums/showthread.php?t=13299 that it should be
possible with DEXSeq to analyze paired samples as well. In my case I
have for every individual patient two (or more) different celltypes
which I would like to analyze, so I would have to use a paired test
for differential exon usage. In the thread mentioned it was said that
there is more information about how to do this in the vignette
available, but I cannot find it. How would I have to add the patient
information to account for paired samples in DEXSeq?

Sorry for asking that many (pretty ignorant) questions, it's the first
time I am trying to use a package dealing with RNAseq-data, and I hope
someone can help.

Best whishes,
Simone

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets
methods   base

other attached packages:
[1] pasilla_0.2.16     DESeq_1.12.0       lattice_0.20-15
locfit_1.5-9.1     DEXSeq_1.6.0
[6] Biobase_2.20.0     BiocGenerics_0.6.0

loaded via a namespace (and not attached):
 [1] annotate_1.38.0      AnnotationDbi_1.22.6 biomaRt_2.16.0
Biostrings_2.28.0    bitops_1.0-5
 [6] DBI_0.2-7            genefilter_1.42.0    geneplotter_1.38.0
GenomicRanges_1.12.4 grid_3.0.1
[11] hwriter_1.3          IRanges_1.18.1       RColorBrewer_1.0-5
RCurl_1.95-4.1       Rsamtools_1.12.3
[16] RSQLite_0.11.4       splines_3.0.1        statmod_1.4.17
stats4_3.0.1         stringr_0.6.2
[21] survival_2.37-4      tools_3.0.1          XML_3.96-1.1
xtable_1.7-1         zlibbioc_1.6.0