[R] Query regarding SVD of binary matrix:

Angel Russo angerusso1980 at gmail.com
Thu Jun 7 23:06:01 CEST 2012


Hello,

I have a binary matrix of 80k sets (sets comprising of combination of
cities) by 885 cities
(dimension = 80k x 885). For matrix, 1 means city is a part of the set and
0 means the city is not part of the set.

Sets are rows and cities are columns (city.test).

I want to do feature reduction to only keep important sets (most likely
2-10 sets of city combinations) and the associated cities. So I chose SVD
and I am following these steps but not sure how to go about the next step.
Could anyone help with this?

s <- svd(city.test)
D <- diag(s$d)
d2 <- (s$d)^2
ratio <- cumsum(d2/dum(d2))   # proportion of total variance from 885 PCs.

and looking at the plots, I see about first ~10 or 20 PCs explain the most
variation (Please see attatched plot). How do I use this to extract the
most relevant sets from my original matrix? COuld you please help.

A friend of mine recommended plotting: rowSums(abs(s$u*s$d)) and choosing
only the highest magnitude sets. I didn't understand the significance of
it. Most probably, it reflects that only the first PC contributes the most,
hence we only care about rowsum(abs(u*d)). Is this correct?

Thanks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: variance-cities.pdf
Type: application/pdf
Size: 24376 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120607/78e1ffca/attachment.pdf>


More information about the R-help mailing list