[R] scale or not to scale that is the question - prcomp

Albyn Jones jones at reed.edu
Wed Aug 19 18:29:18 CEST 2009


scaling changes the metric, ie which things are close to each other.
there is no reason to expect the picture to look the same when you 
change the metric.

On the other hand, your two pictures don't look so different to me.
It appears that the scaled plot is similar to the unscaled plot, with
the roles of the second and third pc reversed, ie the scaled plot is
similar but rotated and distorted.  For example, the observations
forming the strip across the bottom of the first plot form a vertical
strip on the right hand side of the second plot.

albyn

On Wed, Aug 19, 2009 at 02:31:23PM +0200, Petr PIKAL wrote:
> Dear all
> 
> here is my data called "rglp"
> 
> structure(list(vzorek = structure(1:17, .Label = c("179/1/1", 
> "179/2/1", "180/1", "181/1", "182/1", "183/1", "184/1", "185/1", 
> "186/1", "187/1", "188/1", "189/1", "190/1", "191/1", "192/1", 
> "R310", "R610L"), class = "factor"), iep = c(7.51, 7.79, 5.14, 
> 6.35, 5.82, 7.13, 5.95, 7.27, 6.29, 7.5, 7.3, 7.27, 6.46, 6.95, 
> 6.32, 6.32, 6.34), skupina = c(7.34, 7.34, 5.14, 6.23, 6.23, 
> 7.34, 6.23, 7.34, 6.23, 7.34, 7.34, 7.34, 6.23, 7.34, 6.23, 6.23, 
> 6.23), sio2 = c(0.023, 0.011, 0.88, 0.028, 0.031, 0.029, 0.863, 
> 0.898, 0.95, 0.913, 0.933, 0.888, 0.922, 0.882, 0.923, 1, 1), 
>     p2o5 = c(0.78, 0.784, 1.834, 1.906, 1.915, 0.806, 1.863, 
>     0.775, 0.817, 0.742, 0.783, 0.759, 0.787, 0.758, 0.783, 3, 
>     2), al2o3 = c(5.812, 5.819, 3.938, 5.621, 3.928, 3.901, 5.621, 
>     5.828, 4.038, 5.657, 3.993, 5.735, 4.002, 5.728, 4.042, 6, 
>     5), dus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
>     1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("ano", "ne"), class = 
> "factor")), .Names = c("vzorek", 
> "iep", "skupina", "sio2", "p2o5", "al2o3", "dus"), class = "data.frame", 
> row.names = c(NA, 
> -17L))
> 
> and I try to do principal component analysis. Here is one without scaling
> 
> fit<-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2)
> biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)
> 
> you can see that data make 3 groups according to variables sio2 and dus 
> which seems to be reasonable as lowest group has different value of dus = 
> "ano" while highest group has low value of sio2.
> 
> But when I do the same with scale=T
> 
> fit<-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2, 
> scale=T)
> biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)
> 
> I get completely different picture which is not possible to interpret in 
> such an easy way.
> 
> So if anybody can advice me if I shall follow recommendation from help 
> page (which says
> The default is FALSE for consistency with S, but in general scaling is 
> advisable.
> or if I shall stay with scale = FALSE and with simply interpretable 
> result?
>  
> Thank you.
> 
> Petr Pikal
> petr.pikal at precheza.cz
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list