[BioC] Question about PCA and transformed data in DESeq2

amandine.fournier at chu-lyon.fr amandine.fournier at chu-lyon.fr
Thu Oct 10 11:16:19 CEST 2013

Dear Michael, Simon, Wolfgang and others,

I am a little bit confused about the count data transformations and the Principal Component Analysis in DESeq2.

In the last vignette, the example on pages 18-19 shows a PCA plot of the samples, obtained with regularized log transformed data (rld).
But in the plotPCA R documentation, it is written to use a SummarizedExperiment with transformed data produced by ‘varianceStabilizingTransformation’ (vst).
This is quite discrepant, so I wonder which type of transformation I should use.

Moreover, when applied to my real dataset (one group of 2 patients and another group of 2 control cases), I see the following :
     - when no transformation is applied, axis 1 = pathology (patients vs control cases) and axis 2 = unknown factor
     - when transformed with r-log (rld), axis 1 = unknown factor and axis 2 = pathology
     - when transformed with variance (vst), axis 1 = sex (girls vs boys),  axis 2 = unknown factor

So, I wonder if the data are driven by the pathology or by the sex of the subjects ? Is it incorrect to use untransformed data in PCA ?
I don't really understand the usefulness of transforming the data since, as far as I understand, it is not used in DE analysis afterwards.

Thank you in advance for your reply.
Best regards,

Amandine Fournier
Lyon Neuroscience Research Center
and Lyon Civil Hospitals (France)

