# [R] How Can SVD Reconstruct a Matrix

Fri Aug 15 02:52:08 CEST 2014

```On 15/08/2014 2:40 am, Peter Langfelder wrote:
> On Wed, Aug 13, 2014 at 11:57 PM, Peter Brady
> <subscriptions at simonplace.net> wrote:
>> > Hi All,
>> >
>> > I've inherited some R code that I can't work out what they've done.  It
>> > appears to work and give sort of reasonable answers, I'm just trying to
>> > work out why they've done what they have.  I suspect that this is a
>> > simple vector identity that I've just been staring at too long and have
>> > forgotten...
>> >
>> > The code:
>> >
>> > GGt <- M0 - M1 %*% M0inv %*% t(M1)
>> > svdGG <- svd(GGt)
>> > Gmat <- svdGG\$u %*% diag(sqrt(svdGG\$d))
>> >
>> > It is supposed to solve:
>> >
>> > G*G^T = M0 - M1*M0^-1*M1^T
>> >
>> > for G, where G^T is the transpose of G.  It is designed to reproduce a
>> > numerical method described in two papers:
>> >
>> > Srikanthan and Pegram, Journal of Hydrology, 371 (2009) 142-153,
>> > Equation A13, who suggest the SVD method but don't describe the
>> > specifics, eg: "...G is found by singular value decomposition..."
>> >
>> > Alternatively, Matalas (1967) Water Resources Research 3 (4) 937-945,
>> > Equation 17, say that the above can be solved using Principle Component
>> > Analysis (PCA).
>> >
>> > I use PCA (specifically POD) and SVD to look at the components after
>> > decomposition, so I'm a bit lost as to how the original matrix G can be
>> > constructed in this case from only the singular values and the left
>> > singular vectors.
> GG' is a symmetric matrix, so left- and right-singular vectors are the
> same. If I recall right, in general it is impossible to find G from
> GG' (I denote the transpose by ') since, given an orthogonal
> transformation U (that is, UU'=1), GUU'G' = GG', so you can only find
> G up to multiplication with an orthogonal transformation matrix.
>
> Since SVD decomposes a matrix X = UDV', the decomposition for GG' is
>
> GG' = UDU'; setting S = sqrt(D) (i.e., diagonal matrix with elements
> that are sqrt of those in D), GG' = USSU' = USS'U', so one solution is
> G = US which is the solution used.
>
> You could use PCA on G, which is roughly equivalent to doing SVD on
> GG' (up to centering and scaling of the columns of G). I am not very
> familiar with PCA in R since I always use SVD, but here's what the
> help file for prcomp (PCA in R) says:
>
>    The calculation is done by a singular value decomposition of the
>      (centered and possibly scaled) data matrix, not by using ‘eigen’
>      on the covariance matrix.  This is generally the preferred method
>      for numerical accuracy.

YES!  Thank you and to Martyn for the same comment.  The identity that I
was missing was that for a symmetric matrix the left and right singular
values are equal, i.e.:

U = V and, by extrapolation, U' == V'

Then, back to the text books, original papers and google I get the
following:

1) G itself is also symmetric as it is a spatial (and perhaps temporal)
correlation matrix.

2) PCA: "we basically try to find eigenvalues and eigenvectors of the
covariance matrix, C. We showed that C = (AA') / (n-1), and thus finding
the eigenvalues and eigenvectors of C is the same as finding the
eigenvalues and eigenvectors of AA'."

3) Further, from some algebra of SVD results it can be shown that from
A = U S V'
we can get:
AA' = U S^2 U'

Hence, in my notation scheme the PCA decomposition of GG' via SVD is:

GG' = U S^2 U'

and the corresponding SVD of G is:

G = U S V'

but as U'=V' I can substitute:

G = U S U'

Hence, I can compute G from the SVD of GG'.  This, unfortunately leads
to the conclusion that there is a bug in the code, which can be
verified, and fixed, in the sample code:

rm(list=ls())

# Make this repeatable for testing
set.seed(12003)

# Generate a dummy set of 1000 months of rain at 10 measurement stations
tsLength <- 1000;
nStations <- 10;
syntheticRain <- runif(tsLength * nStations, 0, 10)
syntheticRain <- matrix(syntheticRain, nrow = tsLength, ncol = nStations)

# Normalise the rainfall before further processing as is standard
practice in
# statistical hydrology. NB: this step should not matter though for the
method
# as the correlation will eliminate the scale and means.
syntheticRain <- scale(syntheticRain)

# Compute the spatial correlation of the stations
spatialCor <- cor(syntheticRain)

# Our dummy GGt:
GGt <- spatialCor %*% t(spatialCor)

# Now attempt the back calculation to verify
svdGGT <- svd(GGt)
spatialCorfromSVD1 <- svdGGT\$u %*% diag(sqrt(svdGGT\$d))
spatialCorfromSVD2 <- svdGGT\$u %*% diag(sqrt(svdGGT\$d)) %*% t(svdGGT\$u)

Thanks very much all
-pete

--