[R] generating correlated random variables from different distributions

Greg Snow Greg.Snow at imail.org
Thu Apr 29 18:31:43 CEST 2010


The method you are using (multiply by cholesky) works for normal distributions, but not necessarily for others (if you want different means/sd, then add/multiply after transforming).

For other distributions this process can sometimes give the correlation you want, but may change the variable(s) to no longer have the desired distribution.

The short answer to your question is "It Depends", the full long answer could fill a full semester course.  If you tell us more of your goal we may be able to give a more useful answer.  The copula package is one possibility.  If you know the conditional distribution of each variable given the others then you can use gibbs sampling.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Richard and Barbara Males
> Sent: Thursday, April 29, 2010 9:18 AM
> To: r-help at r-project.org
> Subject: [R] generating correlated random variables from different
> distributions
> 
> I need to generate a set of correlated random variables for a Monte
> Carlo simulation.   The solutions I have found
> (http://www.stat.uiuc.edu/stat428/cndata.html,
> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers), using
> Cholesky Decomposition, seem to work only if the variables come from
> the same distribution with the same parameters.  My situation is that
> each variable may be described by a different distribution (or
> different parameters of the same distribution).  This approach does
> not seem to work, see code and results below.  Am I missing something
> here?  My math/statistics is not very good, will I need to generate
> correlated uniform random variables on (0,1) and then use the inverse
> distributions to get the desired results I am looking for?  That is
> acceptable, but I would prefer to just generate the individual
> distributions and then correlate them.  Any advice much appreciated.
> Thanks in advance
> 
> R. Males
> Cincinnati, Ohio, USA
> 
> Sample Code:
> # Testing Correlated Random Variables
> 
> # reference
> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers
> # reference http://www.stat.uiuc.edu/stat428/cndata.html
> # create the correlation matrix
> corMat=matrix(c(1,0.6,0.3,0.6,1,0.5,0.3,0.5,1),3,3)
> cholMat=chol(corMat)
> # create the matrix of random variables
> set.seed(1000)
> nValues=10000
> 
> # generate some random values
> 
> matNormalAllSame=cbind(rnorm(nValues),rnorm(nValues),rnorm(nValues))
> matNormalDifferent=cbind(rnorm(nValues,1,1.5),rnorm(nValues,2,0.5),rnor
> m(nValues,6,1.8))
> matUniformAllSame=cbind(runif(nValues),runif(nValues),runif(nValues))
> matUniformDifferent=cbind(runif(nValues,1,1.5),runif(nValues,2,3.5),run
> if(nValues,6,10.8))
> 
> # bind to a matrix
> print("correlation Matrix")
> print(corMat)
> print("Cholesky Decomposition")
> print (cholMat)
> 
> # test same normal
> 
> resultMatNormalAllSame=matNormalAllSame%*%cholMat
> print("correlation matNormalAllSame")
> print(cor(resultMatNormalAllSame))
> 
> # test different normal
> 
> resultMatNormalDifferent=matNormalDifferent%*%cholMat
> print("correlation matNormalDifferent")
> print(cor(resultMatNormalDifferent))
> 
> # test same uniform
> resultMatUniformAllSame=matUniformAllSame%*%cholMat
> print("correlation matUniformAllSame")
> print(cor(resultMatUniformAllSame))
> 
> # test different uniform
> resultMatUniformDifferent=matUniformDifferent%*%cholMat
> print("correlation matUniformDifferent")
> print(cor(resultMatUniformDifferent))
> 
> and results
> 
> [1] "correlation Matrix"
>      [,1] [,2] [,3]
> [1,]  1.0  0.6  0.3
> [2,]  0.6  1.0  0.5
> [3,]  0.3  0.5  1.0
> [1] "Cholesky Decomposition"
>      [,1] [,2]      [,3]
> [1,]    1  0.6 0.3000000
> [2,]    0  0.8 0.4000000
> [3,]    0  0.0 0.8660254
> [1] "correlation matNormalAllSame" <== ok
>           [,1]      [,2]      [,3]
> [1,] 1.0000000 0.6036468 0.3013823
> [2,] 0.6036468 1.0000000 0.5005440
> [3,] 0.3013823 0.5005440 1.0000000
> [1] "correlation matNormalDifferent" <== no good
>           [,1]      [,2]      [,3]
> [1,] 1.0000000 0.9141472 0.2676162
> [2,] 0.9141472 1.0000000 0.2959178
> [3,] 0.2676162 0.2959178 1.0000000
> [1] "correlation matUniformAllSame" <== ok
>           [,1]      [,2]      [,3]
> [1,] 1.0000000 0.5971519 0.2959195
> [2,] 0.5971519 1.0000000 0.5011267
> [3,] 0.2959195 0.5011267 1.0000000
> [1] "correlation matUniformDifferent" <== no good
>           [,1]      [,2]      [,3]
> [1,] 1.0000000 0.2312000 0.0351460
> [2,] 0.2312000 1.0000000 0.1526293
> [3,] 0.0351460 0.1526293 1.0000000
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list