[R] Unexpected scores from weighted PCA with svyprcomp()
Leonardo Ferreira Fontenelle
leonardof at leonardof.med.br
Wed May 4 00:34:15 CEST 2016
Is there something I could do to improve my chances of getting an
answer?
Leonardo Ferreira Fontenelle
http://lattes.cnpq.br/9234772336296638
Em Sex 29 abr. 2016, às 23:40, Leonardo Ferreira Fontenelle escreveu:
> Hello!
>
> I'd like to create an assets-based economic indicator using data from a
> national household survey. The economic indicator is to be the first
> principal component from a principal components analysis, which (given
> the source of the data) I believe should take in consideration the
> sampling weights of the observations. After running the PCA with
> svyprcomp(), from the survey package, I wanted to list the loading (with
> regard to the first principal component) and the scale of the variables,
> so that I can tell people how to "reconstitute" the economic indicator
> from the variables without any knowledge of PCA. This reconstituted
> indicator wouldn't be centered, but that's OK because the important
> thing for the application is the relative position of the observations.
> The unexpected (at least for me) behavior was that the principal
> component returned by svyprcomp() was very different from from the
> reconstituted indicator as well as from the indicator returned by
> predict(). "Different" here means weak correlation and different
> distributions.
>
> I hope the following code illustrates what I mean:
>
> =====
>
> svycor <- function(formula, design) {
> # https://stat.ethz.ch/pipermail/r-help/2003-July/036645.html
> stopifnot(require(survey))
> covariance.matrix <- svyvar(formula, design)
> variables <- diag(covariance.matrix)
> correlation.matrix <- covariance.matrix / sqrt(variables %*%
> t(variables))
> return(correlation.matrix)
> }
>
> library(survey)
> data(api)
> dclus2 <- svydesign(ids = ~ dnum + snum, fpc = ~ fpc1 + fpc2, data =
> apiclus2)
> pc <- svyprcomp( ~ api99 + api00, design = dclus2, scale = TRUE, scores
> = TRUE)
> dclus2$variables$pc1 <- pc$x[, "PC1"]
> dclus2$variables$pc2 <- predict(pc, apiclus2)[, "PC1"]
> mycoef <- pc$rotation[, "PC1"] / pc$scale
> dclus2$variables$pc3 <- with(apiclus2, api99 * mycoef["api99"] + api00 *
> mycoef["api00"])
> svycor(~ pc1 + pc2 + pc3, dclus2)[, ]
> # pc1 pc2 pc3
> # pc1 1.0000000 0.2078789 0.2078789
> # pc2 0.2078789 1.0000000 1.0000000
> # pc3 0.2078789 1.0000000 1.0000000
> plot(svysmooth(~ pc1, dclus2), xlim = c(-2.5, 5), ylim = 0:1)
> lines(svysmooth(~ pc2, dclus2), col = 2)
> lines(svysmooth(~ pc3, dclus2), col = 3)
> legend("topright", legend = c('pc$x[, "PC1"]', 'predict(pc, apiclus2)[,
> "PC1"]', 'Reconstituted indicator'), col = 1:3, lty = 1)
>
> sessionInfo()
> # R version 3.2.4 Revised (2016-03-16 r70336)
> # Platform: x86_64-pc-linux-gnu (64-bit)
> # Running under: Arch Linux
> #
> # locale:
> # [1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
> # [3] LC_TIME=pt_BR.UTF-8 LC_COLLATE=pt_BR.UTF-8
> # [5] LC_MONETARY=pt_BR.UTF-8 LC_MESSAGES=pt_BR.UTF-8
> # [7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C
> # [9] LC_ADDRESS=C LC_TELEPHONE=C
> # [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C
> #
> # attached base packages:
> # [1] grid stats graphics utils datasets grDevices
> # [7] methods base
> #
> # other attached packages:
> # [1] KernSmooth_2.23-15 survey_3.30-3
> #
> # loaded via a namespace (and not attached):
> # [1] tools_3.2.4
>
> =====
>
> This lack of correlation doesn't happen if the survey design object has
> uniform sampling weights or if the the data is analyzed with prcomp().
>
> Why does the returned principal component is so different from the
> predicted and the reconstituted ones? Are predict() and my
> "reconstitution" missing something? Are the three methods equally valid
> but with different interpretations? Is there a bug in svyprcomp() ??
>
> Thanks in advance,
>
> Leonardo Ferreira Fontenelle
> http://lattes.cnpq.br/9234772336296638
>
> ______________________________________________
> R-help em r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list