[R] Generation of correlated variables

Petr Savicky savicky at cs.cas.cz
Thu Mar 15 20:01:46 CET 2012

On Thu, Mar 15, 2012 at 10:48:48AM -0700, Filoche wrote:
> Hi everyone.
> Based on a dependent variable (y), I'm trying to generate some independent
> variables with a specified correlation. For this there's no problems.
> However, I would like that have all my "regressors" to be orthogonal (i.e.
> no correlation among them.
> For example, 
> y = x1 + x2 + x3 where the correlation between y x1 = 0.7, x2 = 0.4 and x3 =
> 0.8.  However, x1, x2 and x3 should not be correlated to each other.


If the following computation is correct, then there is no solution
for the required correlations, but there is one, if the vector of
the required correlations is normalized to have sum of squares 1.

Assume, variables x1, x2, x3 have mean zero and denote s1^2 = var(x1),
s2^2 = var(x2), s3^2 = var(x3) and assume zero correlations among 
x1, x2, x3, so also zero covariances. Then

  var(y) = s1^2 + s2^2 + s3^2
  E y x1 = E x1^2 + E x1 x2 + E x1 x3 = E x1^2 = s1^2

and similarly

  E y x2 = var(x2) = s2^2
  E y x3 = var(x3) = s3^2

So, the correlation cor(y, x1) is

  s1^2/s1/sqrt(s1^2 + s2^2 + s3^2) = s1/sqrt(s1^2 + s2^2 + s3^2)

Expressing all the correlations in this way, we get

  cor(y, x1) = s1/sqrt(s1^2 + s2^2 + s3^2)
  cor(y, x2) = s2/sqrt(s1^2 + s2^2 + s3^2)
  cor(y, x3) = s3/sqrt(s1^2 + s2^2 + s3^2)

Clearly, we have cor(y, x1)^2 + cor(y, x2)^2 + cor(y, x3)^2 = 1.
For your numbers, we get

  r <- c(0.7, 0.4, 0.8)
  sum(r^2) # [1] 1.29

So, for these numbers, the conditions are contradictory. However,
a solution may be found for the vector of correlations

  [1] 0.6163156 0.3521804 0.7043607

which are the original correlations normalized to have sum of
squares 1. In this case, independent normal variables with the
standard deviations (s1, s2, s3) == r/sqrt(1.29) will satisfy
your conditions.

I hope that other members of the list correct me, if i overlooked

Petr Savicky.

More information about the R-help mailing list