[R] R help

peter dalgaard pdalgd at gmail.com
Sun Feb 19 18:40:52 CET 2012


On Feb 19, 2012, at 16:53 , li li wrote:

> Petr,
>   Thanks for the help. That certainly makes sense and solves my current
> problem. But, in general, for arbitrary covariance matrix (instead of the
> special equi-correlation case), it there a way to generate numbers from
> multivariate normal when the dimension is very high?


In the general case, there is really no alternative to an algorithm on the order of p^2 in space and p^2 per replication in time. The covariance matrix is of size p*(p+1)/2 and you will have to multiply by its square root for each replicate. If you run out of space, you are out of luck. Any potential speedups will have to rely on special structure of the covariance.

>   Thanks.
>         Hannah
> 
> ÔÚ 2012Äê2ÔÂ19ÈÕ ÉÏÎç5:01£¬Petr Savicky <savicky at cs.cas.cz>‹´µÀ£º
> 
>> On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:
>>> Dear all,
>>>  I need to generate numbers from multivariate normal with large
>> dimensions
>>> (5,000,000).
>>> Below is my code and the error I got from R. Sigma in the code is the
>>> covariance
>>> matrix. Can anyone give some idea on how to take care of this error.
>> Thank
>>> you.
>>>                Hannah
>>> 
>>>> m <- 5000000
>>>> m1 <- 0.5*m
>>>> rho <- 0.5
>>>> Sigma <- rho* matrix(1, m, m)+diag(1-rho, m)
>>> Error in matrix(1, m, m) : too many elements specified
>> 
>> Hi.
>> 
>> The matrix of dimension m times m does not fit into memory,
>> since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB.
>> 
>> Generating a multivariate normal with a covariance matrix
>> with 1 on the diagonal and rho outside of the diagonal may
>> be done also as follows.
>> 
>> m <- 10 # can be 5000000
>> rho <- 0.5
>> # single vector
>> x <- rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho))
>> 
>> # several vectors
>> a <- t(replicate(10000, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 -
>> rho))))
>> 
>> # check the sample covariance matrix if m is not too large
>> sigma <- cov(a)
>> range(diag(sigma)) # elements on the diagonal
>> 
>> [1] 0.9963445 1.0196015
>> 
>> diag(sigma) <- NA
>> range(sigma, na.rm=TRUE) # elements outside of the diagonal
>> 
>> [1] 0.4935129 0.5162836
>> 
>> Generating several vectors using replicate() may not be efficient.
>> The following can be used instead.
>> 
>> n <- 10000
>> a <- matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n,
>> sd=sqrt(rho))
>> 
>> Note that the size of "a" is n times m and it should fit into the memory.
>> 
>> Hope this helps.
>> 
>> Petr Savicky.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list