[R] Generating correlated data from uniform distribution

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Sat Jul 2 13:22:19 CEST 2005

On 02-Jul-05 Peter Dalgaard wrote:
> "Jim Brennan" <jfbrennan at rogers.com> writes:
>> OK now I am skeptical especially when you say in a weird way:-)
>> This may be OK but look at plot(x,y) and I am suspicious. Is it still
>> alright with this kind of relationship?
> ...
>> N <- 10000
>> rho <- .6
>> x <- runif(N, -.5,.5)
>> y <- x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))
> Well, the covariance is (everything has mean zero, of course)
> E(XY) = (1+rho)/2*EX^2 + (1-rho)/2*E(X*-X) = rho*EX^2 
> The marginal distribution of Y is a mixture of two identical uniforms
> (X and -X) so is uniform and in particular has the same variance as X.
> In summary,  EXY/sqrt(EX^2EY^2) == rho
> So as I said, it satisfies the formal requirements. X and Y are
> uniformly distributed and their correlation is rho. 
> If for nothing else, I suppose that this example is good for
> demonstrating that independence and uncorrelatedness is not the same
> thing.

That was a nice sneaky solution! I was toying with something similar,
but less sneaky, until I saw Peter's, on the lines of

  x<-runif(2N, -0.5,0.5); ix<-(N-k):(N+k); y<-x; y[ix]<-(-y[ix])

(which makes the same point about independence and correlation).
The larger k as a fraction of N, the more you swing from rho = 1
to rho = -1, but you cannot achieve, as Peter did, an arbitrary
correlation coefficient rho since the value depends on k which
can only take discrete values.

Another approach which leads to a less "special" joint distribution

  x<-sort(runif(N, -0.5,0.5)); y<-sort(runif(N, -0.5,0.5))

followed by a rho-dependent permutation of y. I'm still pondering
a way of choosing the permutation so as to get a desired rho.

The extremes are the identity, which for a given sample will
give as close as you can get to rho = +1, and reversal, which
gives as close as you can get to rho = -1.

However, the maximum theoretical rho which you can get (as opposed
to what is possible for particular samples, which may get arbitrarily
close to +1) depends on N. For instance, with N=3, it looks as
though the theoretical rho is about 0.9 with the "identity"
permutation (for N=1000, however, just about all samples give
rho > 0.99).

I smell a source of interesting exam questions ...

Over to you!

Best wishes,

E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Jul-05                                       Time: 12:22:09
------------------------------ XFMail ------------------------------

More information about the R-help mailing list