[BioC] Question about PCA and transformed data in DESeq2

amandine.fournier at chu-lyon.fr amandine.fournier at chu-lyon.fr
Thu Oct 10 11:16:19 CEST 2013

Dear Michael, Simon, Wolfgang and others,

I am a little bit confused about the count data transformations and the Principal Component Analysis in DESeq2.

In the last vignette, the example on pages 18-19 shows a PCA plot of the samples, obtained with regularized log transformed data (rld).
But in the plotPCA R documentation, it is written to use a SummarizedExperiment with transformed data produced by ‘varianceStabilizingTransformation’ (vst).
This is quite discrepant, so I wonder which type of transformation I should use.

Moreover, when applied to my real dataset (one group of 2 patients and another group of 2 control cases), I see the following :
     - when no transformation is applied, axis 1 = pathology (patients vs control cases) and axis 2 = unknown factor
     - when transformed with r-log (rld), axis 1 = unknown factor and axis 2 = pathology
     - when transformed with variance (vst), axis 1 = sex (girls vs boys),  axis 2 = unknown factor

So, I wonder if the data are driven by the pathology or by the sex of the subjects ? Is it incorrect to use untransformed data in PCA ?
I don't really understand the usefulness of transforming the data since, as far as I understand, it is not used in DE analysis afterwards.

Thank you in advance for your reply.
Best regards,

Amandine Fournier
Lyon Neuroscience Research Center
and Lyon Civil Hospitals (France)

More information about the Bioconductor mailing list