[R] Axis scaling for PCA biplot
    Christian Hennig 
    chr|@t|@n@henn|g @end|ng |rom un|bo@|t
       
    Tue Nov 15 12:53:30 CET 2022
    
    
  
Hi there,
I'm puzzled about the axis scaling in the PCA biplot.
Here's an example.
library(pdfCluster) # package cepp seems to have the same data set.
data(oliveoil) # 572 observations 10 variables
olive <- oliveoil[,3:10] # numerical variables
prolive <- princomp(olive)
summary(prolive)
# Importance of components:
#                             Comp.1       Comp.2 Comp.3       Comp.4
# Standard deviation     479.7299024 150.82827868 45.394449751 27.522646558
# Proportion of Variance   0.8970072   0.08866821  0.008031707 0.002952451
# Cumulative Proportion    0.8970072   0.98567544  0.993707152 0.996659603
#                             Comp.5       Comp.6 Comp.7       Comp.8
# Standard deviation     24.78169442 1.196956e+01 7.1390744088 6.9756965249
# Proportion of Variance  0.00239367 5.584168e-04 0.0001986489 0.0001896608
# Cumulative Proportion   0.99905327 9.996117e-01 0.9998103392 1.0000000000
plot(prolive$scores)
# Scaling of this plot reproduces the variances of the components given 
in the summary,
# as does cov(prolive$scores). This seems all fine, however...
biplot(prolive)
I have no idea what the numbers on the axes of the biplot are, at least 
not the larger ones. Chances are the smaller ones indicate the loadings. 
The larger ones are neither the same as in the first plot, nor are they 
standardised to one, but they seem to be standardised somehow, as the 
range on x- and y-axis looks the same, which it shouldn't be if 
variances represented the PCA eigenvalues.
Can anyone explain this to me?
Actually the help page of biplot.princomp says something on this, but I 
don't get my head around it:
"scale
The variables are scaled by |lambda ^ scale| and the observations are 
scaled by |lambda ^ (1-scale)| where |lambda| are the singular values as 
computed by |princomp 
<https://www.rdocumentation.org/link/princomp?package=stats&version=3.6.2>|. 
Normally |0 <= scale <= 1|, and a warning will be issued if the 
specified |scale| is outside this range."
The default value of scale seems to be 1, but then (1-scale) is zero so 
I'd assume data to be unscaled, but that should have reproduced the 
"plot" scale, shouldn't it?
Thanks,
Christian
-- 
Christian Hennig
Dipartimento di Scienze Statistiche "Paolo Fortunati",
Universita di Bologna, phone +39 05120 98163
christian.hennig using unibo.it
	[[alternative HTML version deleted]]
    
    
More information about the R-help
mailing list