[R] simulate data based on partial correlation matrix

Moshe Olshansky m_olshansky at yahoo.com
Tue Aug 5 10:05:20 CEST 2008


Hi Benjamin,

Creating 0 correlations is easier and always possible, but creating arbitrary correlations can be done as well (when possible - see below).
Suppose that x1,x2,x3,x4 have mean 0 and suppose that the desired correlations are r = (r1,r2,r3,r4). Let A be an orthogonal 4x4 matrix such that (y1,y2,y3,y4) = (x1,x2,x3,x4)*A are orthonormal (i.e. the norm of y1,y2,y3,y4 is 1 and they are orthogonal to each other - this can be done if x1,x2,x3,x4 are independent). Then the correlations between z and y1,y2,y3,y4 should be q = (q1,q2,q3,q4) = (r1,r2,r3,r4)*A. Let now v be any vector which has norm 1, mean 0 and is orthogonal to y1,y2,y3,y4 (it exists and can be easily found - see below). Now let z = a1*y1 + a2*y2 + a3*y3 +a4*y4 + a*v where a1,a2,a3,a4,a are scalars and assume that the norm of z is 1. In order for the correlations between z and y1,y2,y3,y4 to be q1,q2,q3,q4 we must have a1=q1,a2=q2,a3=q3,a4=q4. Moreover, the square of the norm of z is q1^2 + q2^2 + q3^2 + q4^2 + a^2 = 1, which means that q1^2 + q2^2 + q3^2 + q4^2 <= 1.
This condition is necessary and sufficient for a solution to exist!!!
So now we need to things: (a) to find the matrix A and (b) to find the vector v.
(a) Let M = (x1,x2,x3,x4) be a nx4 matrix, Then t(M)*M is a 4x4 matrix which is positive definite if x1,x2,x3,x4 are independent (let's assume this). Then the exists a unitary matrix U such that t(M)*M = t(U)*D*U where D is a diagonal matrix (U and D can be found using the function eigen). In such a case A = t(U)*(1/sqrt(D))*U.
(b) To find v, test Vi = (-1,0,..,0,1,0,0,...,0) where the 1 is at place i. We need to find Vi which is independent of y1,y2,y3,y4. Most probably any of V2,V3,... is independent of y1,...,y4, but in any case at least one of V2,V3,V4,V5,V6 is independent of them. To check which one compute (y1%*%V)^2 + ... + (y4%*%V)^2 (where %*% is dot product). If this sum is noticeably less than 2 (= V%*%V) then this is what we need (let's say it is less than 1.99). Now take v = V - (V%*%y1)*y1 - (V%*%y2)*y2- (V%*%y3)*y3 - (V%*%y4)*y4 and normalize v so that it's norm is 1.

Regards,

Moshe.


--- On Tue, 5/8/08, Benjamin Michael Kelcey <bkelcey at umich.edu> wrote:

> From: Benjamin Michael Kelcey <bkelcey at umich.edu>
> Subject: [R] simulate data based on partial correlation matrix
> To: r-help at r-project.org
> Received: Tuesday, 5 August, 2008, 6:24 AM
> Given four known and fixed vectors, x1,x2,x3,x4, I am trying
> to  
> generate a fifth vector,z, with specified known and fixed
> partial  
> correlations.
> How can I do this?
> 
> In the past I have used the following (thanks to Greg Snow)
> to  
> generate a fifth vector based on zero order
> correlations---however I'd  
> like to modify it so that it can generate a fifth vector
> with specific  
> partial correlations rather than zero order correlations:
> 
> # create x1-x4
> x1 <- rnorm(100, 50, 3)
> x2 <- rnorm(100) + x1/5
> x3 <- rnorm(100) + x2/5
> x4 <- rnorm(100) + x3/5
> 
> # find current correlations
> cor1 <- cor( cbind(x1,x2,x3,x4) )
> cor1
> 
> # create 1st version of z
> z <- rnorm(100)
> # combine in a matrix
> m1 <- cbind( x1, x2, x3, x4, z )
> 
> # center and scale
> m2 <- scale(m1)
> 
> # find cholesky decomp
> c1 <- chol(var(m2))
> 
> # force to be independent
> m3 <- m2 %*% solve(c1)
> 
> # create new correlation matrix:
> cor2 <- cbind( rbind( cor1, z=c(.5,.3,.1,.05) ),
> z=c(.5,.3,.1,.05,1) )
> 
> # create new matrix
> m4 <- m3 %*% chol(cor2)
> 
> # uncenter and unscale
> m5 <- sweep( m4, 2, attr(m2, 'scaled:scale'),
> '*')
> m5 <- sweep( m5, 2, attr(m2, 'scaled:center'),
> '+')
> 
> ##Check they are equal
> zapsmall(cor(m5))==zapsmall(cor2)
> 
> Thanks, ben
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.



More information about the R-help mailing list