[R] Generation of correlated variables
Petr Savicky
savicky at cs.cas.cz
Thu Mar 15 20:01:46 CET 2012
On Thu, Mar 15, 2012 at 10:48:48AM -0700, Filoche wrote:
> Hi everyone.
>
> Based on a dependent variable (y), I'm trying to generate some independent
> variables with a specified correlation. For this there's no problems.
> However, I would like that have all my "regressors" to be orthogonal (i.e.
> no correlation among them.
>
> For example,
>
> y = x1 + x2 + x3 where the correlation between y x1 = 0.7, x2 = 0.4 and x3 =
> 0.8. However, x1, x2 and x3 should not be correlated to each other.
Hi.
If the following computation is correct, then there is no solution
for the required correlations, but there is one, if the vector of
the required correlations is normalized to have sum of squares 1.
Assume, variables x1, x2, x3 have mean zero and denote s1^2 = var(x1),
s2^2 = var(x2), s3^2 = var(x3) and assume zero correlations among
x1, x2, x3, so also zero covariances. Then
var(y) = s1^2 + s2^2 + s3^2
E y x1 = E x1^2 + E x1 x2 + E x1 x3 = E x1^2 = s1^2
and similarly
E y x2 = var(x2) = s2^2
E y x3 = var(x3) = s3^2
So, the correlation cor(y, x1) is
s1^2/s1/sqrt(s1^2 + s2^2 + s3^2) = s1/sqrt(s1^2 + s2^2 + s3^2)
Expressing all the correlations in this way, we get
cor(y, x1) = s1/sqrt(s1^2 + s2^2 + s3^2)
cor(y, x2) = s2/sqrt(s1^2 + s2^2 + s3^2)
cor(y, x3) = s3/sqrt(s1^2 + s2^2 + s3^2)
Clearly, we have cor(y, x1)^2 + cor(y, x2)^2 + cor(y, x3)^2 = 1.
For your numbers, we get
r <- c(0.7, 0.4, 0.8)
sum(r^2) # [1] 1.29
So, for these numbers, the conditions are contradictory. However,
a solution may be found for the vector of correlations
r/sqrt(1.29)
[1] 0.6163156 0.3521804 0.7043607
which are the original correlations normalized to have sum of
squares 1. In this case, independent normal variables with the
standard deviations (s1, s2, s3) == r/sqrt(1.29) will satisfy
your conditions.
I hope that other members of the list correct me, if i overlooked
something.
Petr Savicky.
More information about the R-help
mailing list