[R] generating correlated random variables from different distributions
Richard and Barbara Males
rbmales at gmail.com
Sun May 2 21:36:56 CEST 2010
Thank you for your reply. The application is a Monte Carlo simulation
in environmental planning. Different possible remediation measures
have different costs, and produce different results. For example, a
$20,000 plan may add 10 acres of wetlands and 12 acres of bird
habitat. The desire is to describe the uncertainty in the cost and
the outputs (acres of wetlands, acres of bird habitat) by
distributions. The cost may be described by a normal distribution,
mean $20k, $5k SD, and the 12 acres of birds may be described by a
uniform distribution (10 to 14). [These are just examples, not
representative of a real problem]. We may know (or think) that
wetlands and bird habitat are positively correlated (0.6), and that
there is a stronger correlation of both with cost (0.85). So the
effort is to generate, through MCS, values at each iteration of cost,
acres of wetland, and acres of bird habitat, such that the resultant
values give the same correlation, and the values of cost, bird habitat
and wetland habitat return the input distributions. The overall
desire is compare different remediation measures, taking into account
uncertainty in costs and results.
One possible approach (although I have not tried it yet, but will do
so in the near future) is to generate, for each iteration, three
independent (0,1) random variables, correlate them via the Cholesky
approach, and use them as input to the inverse normal, inverse
uniform, etc. to get the three variables for each iteration. The
primary distributions of interest are normal, uniform, triangular,
gamma, and arbitrary cdf, so this approach seems plausible in that
inverse distributions are readily available.
Thanks in advance.
Dick Males
Cincinnati, OH, USA
On Thu, Apr 29, 2010 at 12:31 PM, Greg Snow <Greg.Snow at imail.org> wrote:
> The method you are using (multiply by cholesky) works for normal distributions, but not necessarily for others (if you want different means/sd, then add/multiply after transforming).
>
> For other distributions this process can sometimes give the correlation you want, but may change the variable(s) to no longer have the desired distribution.
>
> The short answer to your question is "It Depends", the full long answer could fill a full semester course. If you tell us more of your goal we may be able to give a more useful answer. The copula package is one possibility. If you know the conditional distribution of each variable given the others then you can use gibbs sampling.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Richard and Barbara Males
>> Sent: Thursday, April 29, 2010 9:18 AM
>> To: r-help at r-project.org
>> Subject: [R] generating correlated random variables from different
>> distributions
>>
>> I need to generate a set of correlated random variables for a Monte
>> Carlo simulation. The solutions I have found
>> (http://www.stat.uiuc.edu/stat428/cndata.html,
>> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers), using
>> Cholesky Decomposition, seem to work only if the variables come from
>> the same distribution with the same parameters. My situation is that
>> each variable may be described by a different distribution (or
>> different parameters of the same distribution). This approach does
>> not seem to work, see code and results below. Am I missing something
>> here? My math/statistics is not very good, will I need to generate
>> correlated uniform random variables on (0,1) and then use the inverse
>> distributions to get the desired results I am looking for? That is
>> acceptable, but I would prefer to just generate the individual
>> distributions and then correlate them. Any advice much appreciated.
>> Thanks in advance
>>
>> R. Males
>> Cincinnati, Ohio, USA
>>
>> Sample Code:
>> # Testing Correlated Random Variables
>>
>> # reference
>> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers
>> # reference http://www.stat.uiuc.edu/stat428/cndata.html
>> # create the correlation matrix
>> corMat=matrix(c(1,0.6,0.3,0.6,1,0.5,0.3,0.5,1),3,3)
>> cholMat=chol(corMat)
>> # create the matrix of random variables
>> set.seed(1000)
>> nValues=10000
>>
>> # generate some random values
>>
>> matNormalAllSame=cbind(rnorm(nValues),rnorm(nValues),rnorm(nValues))
>> matNormalDifferent=cbind(rnorm(nValues,1,1.5),rnorm(nValues,2,0.5),rnor
>> m(nValues,6,1.8))
>> matUniformAllSame=cbind(runif(nValues),runif(nValues),runif(nValues))
>> matUniformDifferent=cbind(runif(nValues,1,1.5),runif(nValues,2,3.5),run
>> if(nValues,6,10.8))
>>
>> # bind to a matrix
>> print("correlation Matrix")
>> print(corMat)
>> print("Cholesky Decomposition")
>> print (cholMat)
>>
>> # test same normal
>>
>> resultMatNormalAllSame=matNormalAllSame%*%cholMat
>> print("correlation matNormalAllSame")
>> print(cor(resultMatNormalAllSame))
>>
>> # test different normal
>>
>> resultMatNormalDifferent=matNormalDifferent%*%cholMat
>> print("correlation matNormalDifferent")
>> print(cor(resultMatNormalDifferent))
>>
>> # test same uniform
>> resultMatUniformAllSame=matUniformAllSame%*%cholMat
>> print("correlation matUniformAllSame")
>> print(cor(resultMatUniformAllSame))
>>
>> # test different uniform
>> resultMatUniformDifferent=matUniformDifferent%*%cholMat
>> print("correlation matUniformDifferent")
>> print(cor(resultMatUniformDifferent))
>>
>> and results
>>
>> [1] "correlation Matrix"
>> [,1] [,2] [,3]
>> [1,] 1.0 0.6 0.3
>> [2,] 0.6 1.0 0.5
>> [3,] 0.3 0.5 1.0
>> [1] "Cholesky Decomposition"
>> [,1] [,2] [,3]
>> [1,] 1 0.6 0.3000000
>> [2,] 0 0.8 0.4000000
>> [3,] 0 0.0 0.8660254
>> [1] "correlation matNormalAllSame" <== ok
>> [,1] [,2] [,3]
>> [1,] 1.0000000 0.6036468 0.3013823
>> [2,] 0.6036468 1.0000000 0.5005440
>> [3,] 0.3013823 0.5005440 1.0000000
>> [1] "correlation matNormalDifferent" <== no good
>> [,1] [,2] [,3]
>> [1,] 1.0000000 0.9141472 0.2676162
>> [2,] 0.9141472 1.0000000 0.2959178
>> [3,] 0.2676162 0.2959178 1.0000000
>> [1] "correlation matUniformAllSame" <== ok
>> [,1] [,2] [,3]
>> [1,] 1.0000000 0.5971519 0.2959195
>> [2,] 0.5971519 1.0000000 0.5011267
>> [3,] 0.2959195 0.5011267 1.0000000
>> [1] "correlation matUniformDifferent" <== no good
>> [,1] [,2] [,3]
>> [1,] 1.0000000 0.2312000 0.0351460
>> [2,] 0.2312000 1.0000000 0.1526293
>> [3,] 0.0351460 0.1526293 1.0000000
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list