[R] Generating uniformly distributed correlated data.

Mike Marchywka marchywka at hotmail.com
Mon Feb 21 16:00:24 CET 2011







----------------------------------------
> Date: Mon, 21 Feb 2011 15:53:26 +0100
> From: erich.neuwirth at univie.ac.at
> To: marchywka at hotmail.com
> CC: soren.faurby at biology.au.dk; r-help at r-project.org
> Subject: Re: [R] Generating uniformly distributed correlated data.
>
> We want to generate a distribution on the unit square with the following
> properties
> * It is concentrated on a "reasonable" subset of the square,
> and the restricted distribution is uniform on this subset.
> * Both marginal distributions are uniform on the unit interval.
> * All horizontal and all vertical cross sections are sets of lines
> segments with the same total length
>
> If we find a geometric figure with these properties, we have solved the
> problem.
>
> So we define the distribution to be uniform on the following area:
> (it is distorted but should give the idea)
>
> x***/-----------------/***x
> |**/-----------------/****|
> |*/-----------------/*****|
> |/-----------------/******|
> |-----------------/******/|
> |----------------/******/-|
> |---------------/******/--|
> |--------------/******/---|
> |-------------/******/----|
> |------------/******/-----|
> |-----------/******/------|
> |----------/******/-------|
> |---------/******/--------|
> |--------/******/---------|
> |-------/******/----------|
> |------/******/-----------|
> |-----/******/------------|
> |----/******/-------------|
> |---/******/--------------|
> |--/******/---------------|
> |-/******/----------------|
> |/******/-----------------|
> |******/-----------------/|
> |*****/-----------------/*|
> |****/-----------------/**|
> x***/-----------------/***x
>
> There is the same number of stars in each horizontal row and each
> vertical column.
>
>
> So we define
> g(x1,x2)= 1 abs(x1-x2) <= a or
> abs(x1-x2+1) <= a or
> abs(x1-x2-1) <= a
> 0 elsewhere
>
> The total area of the shape is 2*a.
> The admissible range for a is <0,1/2>
> therefore
> f(x1,x2)=g(x1,x2)/(2*a)
> is a density functions.
> This is where simple algebra comes in.
> This distribution has
> expected value 1/2 and variance 1/12 for both margins
> (uniform distribution), and it has
> covariance = (1-3*a+2*a2)/12
> and correlation = 1 - 3*a + 2*a2
>
> The inverse function of 1 - 3*2 + 2*a2 is
> (3-sqrt(1+8*r))/4
>
> Therefore we can compute that our distribution with
> a=(3-sqrt(1+8*r))/4
> will produce a given r.
>
>
> Ho do we create random numbers from this distribution?
> By using conditional densities.
> x1 is sampled from the uniform distribution, and for a give x1
> we produce x2 by a uniform distribution on the along the vertical cross
> cut of the geometrical shape (which is either 1 or 2 intervals).
> And which is most easily implemented by using the modulo operator %%.
>
> This mechanism is NOT a convolution. Applying module after the addition
> makes it a nonconvolution. Adding independent random variables
> without doing anything further is a convolution, by applying a trimming
> operation, the convolution property gets lost.
>
>
The thing inside the mod allows convolution, as I mentioned the effect of
the mod is to move back the pieces that fall outside the desired range
and they happen to restore the uniform distribution. I thought my 
explanation was simple and easy after the fact but not sure
it would have motivated the original design too well. 



>
>
>
>
>
 		 	   		  


More information about the R-help mailing list