[BioC] DESeq2 vs limma/voom: PCA completely different. How to proceed?

Stephen Turner vustephen at gmail.com
Tue Apr 23 20:39:25 CEST 2013


Thanks Steve. Using the variance stabilizing transformation on the
count data helped very much. The PCA on the variance stabilized data
look very similar to the PCA on the limma data.

On Tue, Apr 23, 2013 at 2:06 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi Stephen,
>
> Later on in the DESeq2 (and original DESeq) vignette, the authors
> recommend shooting your counts through some type of variance
> stabilizing transformation (either the rlog or vst) for exploratory
> analysis of the type you are performing.
>
> Perhaps you can try that and see how it compares.
>
> HTH,
> -steve
>
>
> On Tue, Apr 23, 2013 at 10:10 AM, Stephen Turner <vustephen at gmail.com> wrote:
>> Short story: I ran a PCA on a matrix of counts from an RNA-seq
>> experiment and then using the voom-tramsformed data, and they're
>> completely different. I imagine my results will be very different
>> using DESeq2 vs limma, and I am wondering how to proceed. Here are the
>> two PCA biplots:
>>
>> http://imgur.com/a/DBRfE
>>
>> More background:
>>
>> I have an RNA-seq experiment with 5 conditions: WT (n=4), and mutants
>> A, B, C, and D (all n=3). I aligned the reads with STAR, counted reads
>> mapping to genes using HTSeq-count. I imported the count data into
>> DESeq2 and processed using the functions described in the vignette,
>> DESeqDataSetFromHTSeqCount() and DESeq().
>>
>> I performed a PCA on the transposed normalized counts table from the
>> DESeq Data Set (dds) object (note the outlier, which is no longer
>> outlying in the voom PCA plot below):
>>
>> pca <- as.data.frame(prcomp(t(na.omit(counts(dds, normalized=TRUE))))$x)
>> col <- 1:5
>> groups <- colData(dds)$condition
>> plot(PC2~PC1, data=pca, bg=col[groups], pch=21, main="DESeq2")
>>
>> Here's that PCA biplot:
>>
>> http://i.imgur.com/P5hhE9Q.png
>>
>> I then used limma/voom to transform the data, and performed another
>> PCA, which looked completely different.
>>
>> design <- model.matrix(~0+colData(dds)$condition)
>> v <- voom(counts=counts(dds, normalized=TRUE), design=design)
>> pca <- as.data.frame(prcomp(t(na.omit(v$E)))$x)
>> pca(PC2~PC1, data=pca, bp=col[groups], pch=21, main="voom")
>>
>> Here's that PCA (notice the outlier C1 is no longer outlying):
>>
>> http://i.imgur.com/oKlnWh3.png
>>
>> Based on this PCA I imagine my results will look very different using
>> DESeq2 or limma on the voom'd data. I understand that this is due to
>> the PCA on the voom transformation is only considering the normalized
>> log2 expression values, and not considering the inverse variance
>> weights. Now I'm just not sure how to proceed with downstream analysis
>> (or quality control with the voom transformation, in general). Any
>> guidance is highly appreciated.
>>
>> Thanks,
>> Stephen
>>
>> ...
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets
>> methods   base
>>
>> other attached packages:
>>  [1] limma_3.16.1            calibrate_1.7.1         MASS_7.3-26
>>  [4] DESeq2_1.0.8            RcppArmadillo_0.3.800.1 Rcpp_0.10.3
>>  [7] lattice_0.20-15         Biobase_2.20.0          GenomicRanges_1.12.2
>> [10] IRanges_1.18.0          BiocGenerics_0.6.0      BiocInstaller_1.10.0
>>
>> loaded via a namespace (and not attached):
>>  [1] annotate_1.38.0      AnnotationDbi_1.22.2 DBI_0.2-5
>>  [4] genefilter_1.42.0    grid_3.0.0           locfit_1.5-9
>>  [7] RColorBrewer_1.0-5   RSQLite_0.11.3       splines_3.0.0
>> [10] stats4_3.0.0         survival_2.37-4      tools_3.0.0
>> [13] XML_3.95-0.2         xtable_1.7-1
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> Steve Lianoglou
> Computational Biologist
> Department of Bioinformatics and Computational Biology
> Genentech



More information about the Bioconductor mailing list