[R] Problem when creating matrix of values based on covariance matrix

Bert Gunter gunter.berton at gene.com
Sat Aug 11 16:27:28 CEST 2012


Sampling error?   Do you realize how large a sample size you would
need to precisely estimate an 8000 x 8000 covariance matrix? Probably
exceeds the number of stars in our galaxy...

Numerical issues may also play a role, but I am too ignorant on this
aspect to offer advice.

Finally, this is really not an R question, so you would probably do
better to post on a stats site like stats.stackexchange.com rather
than here.

-- Bert

On Sat, Aug 11, 2012 at 7:17 AM, Boel Brynedal <brynedal at gmail.com> wrote:
> Hi,
>
> I want to simulate a data set with similar covariance structure as my
> observed data, and have calculated a covariance matrix (dimensions
> 8368*8368). So far I've tried two approaches to simulating data:
> rmvnorm from the mvtnorm package, and by using the Cholesky
> decomposition (http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
> The problem is that the resulting covariance structure in my simulated
> data is very different from the original supplied covariance vector.
> Lets just look at some of the values:
>
>> cov8[1:4,1:4] # covariance of simulated data
>             X1          X2         X3         X4
> X1 34515296.00    99956.69   369538.1  1749086.6
> X2    99956.69 34515296.00  2145289.9  -624961.1
> X3   369538.08  2145289.93 34515296.0  -163716.5
> X4  1749086.62  -624961.09  -163716.5 34515296.0
>> CEUcovar[1:4,1:4]
>              [,1]         [,2]          [,3]         [,4]
> [1,] 0.1873402987  0.001837229  0.0009009272  0.010324521
> [2,] 0.0018372286  0.188665853  0.0124216535 -0.001755035
> [3,] 0.0009009272  0.012421654  0.1867835412 -0.000142395
> [4,] 0.0103245214 -0.001755035 -0.0001423950  0.192883488
>
> So the distribution of the observed covariance is very narrow compared
> to the simulated data.
>
> None of the eigenvalues of the observed covariance matrix are
> negative, and it appears to be a positive definite matrix. Here is
> what I did to create the simulated data:
>
> Chol <- chol(CEUcovar)
> Z <- matrix(rnorm(20351 * 8368), 8368)
> X <- t(Chol) %*% Z
> sample8 <- data.frame(as.matrix(t(X)))
>> dim(sample8)
> [1] 20351  8368
> cov8=cov(sample8,method='spearman')
>
> [earlier I've also tried sample8 <- rmvnorm(1000,
> mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as
> 'bad' results, much larger covariance values in the simulated data ]
>
> Any ideas of WHY the simulated data have such a different covariance?
> Any experience with similar issues? Would be happy to supply the
> covariance matrix if anyone wants to give it a try.
> Any suggestions? Anything apparent that I left our or neglected?
>
> Any advice would be highly appreciated.
> Best,
> Bo
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list