[R] Problem when creating matrix of values based on covariance matrix

Boel Brynedal brynedal at gmail.com
Sat Aug 11 16:17:53 CEST 2012


Hi,

I want to simulate a data set with similar covariance structure as my
observed data, and have calculated a covariance matrix (dimensions
8368*8368). So far I've tried two approaches to simulating data:
rmvnorm from the mvtnorm package, and by using the Cholesky
decomposition (http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
The problem is that the resulting covariance structure in my simulated
data is very different from the original supplied covariance vector.
Lets just look at some of the values:

> cov8[1:4,1:4] # covariance of simulated data
            X1          X2         X3         X4
X1 34515296.00    99956.69   369538.1  1749086.6
X2    99956.69 34515296.00  2145289.9  -624961.1
X3   369538.08  2145289.93 34515296.0  -163716.5
X4  1749086.62  -624961.09  -163716.5 34515296.0
> CEUcovar[1:4,1:4]
             [,1]         [,2]          [,3]         [,4]
[1,] 0.1873402987  0.001837229  0.0009009272  0.010324521
[2,] 0.0018372286  0.188665853  0.0124216535 -0.001755035
[3,] 0.0009009272  0.012421654  0.1867835412 -0.000142395
[4,] 0.0103245214 -0.001755035 -0.0001423950  0.192883488

So the distribution of the observed covariance is very narrow compared
to the simulated data.

None of the eigenvalues of the observed covariance matrix are
negative, and it appears to be a positive definite matrix. Here is
what I did to create the simulated data:

Chol <- chol(CEUcovar)
Z <- matrix(rnorm(20351 * 8368), 8368)
X <- t(Chol) %*% Z
sample8 <- data.frame(as.matrix(t(X)))
> dim(sample8)
[1] 20351  8368
cov8=cov(sample8,method='spearman')

[earlier I've also tried sample8 <- rmvnorm(1000,
mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as
'bad' results, much larger covariance values in the simulated data ]

Any ideas of WHY the simulated data have such a different covariance?
Any experience with similar issues? Would be happy to supply the
covariance matrix if anyone wants to give it a try.
Any suggestions? Anything apparent that I left our or neglected?

Any advice would be highly appreciated.
Best,
Bo



More information about the R-help mailing list