[BioC] scholarly reference for "don't draw PCA/heatmap dendrograms on DEGs"

Davis, Wade davisjwa at health.missouri.edu
Tue Dec 10 18:17:39 CET 2013

I second the comment made by Aaron. I use PCA only for QC discovery and exploration prior to any testing for DEGs.
I also agree with the original comment about it being a self-fulfilling prophecy (not that it always turns out that clearly). Anyone thinking that PCA plots based on DE features (obtained from the same data set) are going to confirm their findings is optimistically biased at best.


-----Original Message-----
From: Aaron Mackey [mailto:ajmackey at gmail.com] 
Sent: Monday, December 09, 2013 10:18 AM
To: Cook, Malcolm
Cc: Bioconductor mailing list
Subject: Re: [BioC] scholarly reference for "don't draw PCA/heatmap dendrograms on DEGs"

On Mon, Dec 9, 2013 at 10:38 AM, Cook, Malcolm <MEC at stowers.org> wrote:

> Have you done either on ALL (not just DE) genes?  If so, do your 
> replicates cluster?  Further, if so, do the distances between 
> replicate clusters scale in any interesting way with condition (i.e. higher dose or
> better knockdown or longer exposure -> further away from untreated).   I
> think this can be taken as "evidence" for condition effects that you 
> and your colleague should expect.  Do you agree with this?

In my experience, I do occassionally see "global" (all genes) clustering in
(*scaled* and centered) PCA that corresponds to experimental conditions; and in such cases I will also find a vast multitude of DEGs (and also brings up the spectre of whether the usual between-sample normalization assumptions are being violated, and whether there may be unequal variances between groups).  Or to consider the situation a different way, when a small number of DEGs exhibit a very large magnitude of variance, then an
*unscaled* global PCA may also show experimental clustering (again, just driven by the variance of those DEGs).  FYI, there are methods (such as implemented in the superpc package) that use the PCA loadings of PCs correlated to experimental design to select DEGs.  It's all quite circular.

Either way, the presence/absence of sample clustering in PCA does not provide any more/less independent evidence of treatment effects not already captured by the DEGs themselves, and so I usually argue that such "DEG-focused" PCA representations are not particularly informative (or at least no more informative than some representation of the DEGs themselves).
 We use the global PCA for QC discovery/confirmation of sample outliers, non-experimental batch effects, etc., but not for evaluation of the experimental axes of interest.


	[[alternative HTML version deleted]]

More information about the Bioconductor mailing list