[R] reviewer comment

S Ellison S.Ellison at LGCGroup.com
Fri Mar 15 15:08:43 CET 2013


 

> My question: what does it mean asymmetry distribution could 
> affect PCA  ? and also outliers could affect factors?

It means what it says. PCA will be affected by asymmetry  and outliers will affect the principal components (sometimes loosely called 'factors') In particular an extreme outlying data point can cause at least one PC to be essentially parallel to the vector between the outlier and the mean of the rest of the data. If you want a picture of factors describing the bulk of the data set, you need to chuck out the extreme points or use robust PCA.

Asymmetry I'd worry less about, at least for exploratory graphical presentation; if I had a nice spherical data set I'd probably not be very interested in the PCA because it'd not have much discriminatory power for groups. But inference based on things like mahalanobis distance often  relies on some sense of multivariate normality or the like, and if the model used for inference isn't built on a symmetric data set the inferences can be badly wrong. Think Turkish flag; the star is 'obviously' not part of the crescent, but in mahalanobis distance it's not much further from the (empty) centre of the crescent than most of the crescent is. 


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}



More information about the R-help mailing list