[R] Unexpected scores from weighted PCA with svyprcomp()

Leonardo Ferreira Fontenelle leonardof at leonardof.med.br
Wed May 4 00:34:15 CEST 2016


Is there something I could do to improve my chances of getting an
answer?

Leonardo Ferreira Fontenelle
http://lattes.cnpq.br/9234772336296638

Em Sex 29 abr. 2016, às 23:40, Leonardo Ferreira Fontenelle escreveu:
> Hello!
> 
> I'd like to create an assets-based economic indicator using data from a
> national household survey. The economic indicator is to be the first
> principal component from a principal components analysis, which (given
> the source of the data) I believe should take in consideration the
> sampling weights of the observations. After running the PCA with
> svyprcomp(), from the survey package, I wanted to list the loading (with
> regard to the first principal component) and the scale of the variables,
> so that I can tell people how to "reconstitute" the economic indicator
> from the variables without any knowledge of PCA. This reconstituted
> indicator wouldn't be centered, but that's OK because the important
> thing for the application is the relative position of the observations.
> The unexpected (at least for me) behavior was that the principal
> component returned by svyprcomp() was very different from from the
> reconstituted indicator as well as from the indicator returned by
> predict(). "Different" here means weak correlation and different
> distributions.
> 
> I hope the following code illustrates what I mean:
> 
> =====
> 
> svycor <- function(formula, design) {
>   # https://stat.ethz.ch/pipermail/r-help/2003-July/036645.html
>   stopifnot(require(survey))
>   covariance.matrix <- svyvar(formula, design)
>   variables <- diag(covariance.matrix)
>   correlation.matrix <- covariance.matrix / sqrt(variables %*%
>   t(variables))
>   return(correlation.matrix)
> }
> 
> library(survey)
> data(api)
> dclus2 <- svydesign(ids = ~ dnum + snum, fpc = ~ fpc1 + fpc2, data =
> apiclus2)
> pc <- svyprcomp( ~ api99 + api00, design = dclus2, scale = TRUE, scores
> = TRUE)
> dclus2$variables$pc1 <- pc$x[, "PC1"]
> dclus2$variables$pc2 <- predict(pc, apiclus2)[, "PC1"]
> mycoef <- pc$rotation[, "PC1"] / pc$scale
> dclus2$variables$pc3 <- with(apiclus2, api99 * mycoef["api99"] + api00 *
> mycoef["api00"])
> svycor(~ pc1 + pc2 + pc3, dclus2)[, ]
> #           pc1       pc2       pc3
> # pc1 1.0000000 0.2078789 0.2078789
> # pc2 0.2078789 1.0000000 1.0000000
> # pc3 0.2078789 1.0000000 1.0000000
> plot(svysmooth(~ pc1, dclus2), xlim = c(-2.5, 5), ylim = 0:1)
> lines(svysmooth(~ pc2, dclus2), col = 2)
> lines(svysmooth(~ pc3, dclus2), col = 3)
> legend("topright", legend = c('pc$x[, "PC1"]', 'predict(pc, apiclus2)[,
> "PC1"]', 'Reconstituted indicator'), col = 1:3, lty = 1)
> 
> sessionInfo()
> # R version 3.2.4 Revised (2016-03-16 r70336)
> # Platform: x86_64-pc-linux-gnu (64-bit)
> # Running under: Arch Linux
> # 
> #  locale:
> #  [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C              
> #  [3] LC_TIME=pt_BR.UTF-8        LC_COLLATE=pt_BR.UTF-8    
> #  [5] LC_MONETARY=pt_BR.UTF-8    LC_MESSAGES=pt_BR.UTF-8   
> #  [7] LC_PAPER=pt_BR.UTF-8       LC_NAME=C                 
> #  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> # [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C       
> # 
> # attached base packages:
> # [1] grid      stats     graphics  utils     datasets  grDevices
> # [7] methods   base     
> # 
> # other attached packages:
> # [1] KernSmooth_2.23-15 survey_3.30-3     
> # 
> # loaded via a namespace (and not attached):
> # [1] tools_3.2.4
> 
> =====
> 
> This lack of correlation doesn't happen if the survey design object has
> uniform sampling weights or if the the data is analyzed with prcomp().
> 
> Why does the returned principal component is so different from the
> predicted and the reconstituted ones? Are predict() and my
> "reconstitution" missing something? Are the three methods equally valid
> but with different interpretations? Is there a bug in svyprcomp() ??
> 
> Thanks in advance,
> 
> Leonardo Ferreira Fontenelle
> http://lattes.cnpq.br/9234772336296638
> 
> ______________________________________________
> R-help em r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list