[R] Generating uniformly distributed correlated data.

Erich Neuwirth erich.neuwirth at univie.ac.at
Tue Feb 22 18:11:37 CET 2011


You do not need the regions with the double densities near the center
(1/2,1/2), you can use just the parallel 45 degree borders in that
region also.

> x---------------------\\\\x
> |--------------------/*\\\|
> |-------------------/***\\|
> |------------------/*****\|
> |-----------------/******/|
> |----------------/******/-|
> |---------------/******/--|
> |--------------/******/---|
> |-------------/******/----|
> |------------/******/-----|
> |-----------/******/------|
> |----------/******/-------|
> |---------/******/--------|
> |--------/******/---------|
> |-------/******/----------|
> |------/******/-----------|
> |-----/******/------------|
> |----/******/-------------|
> |---/******/--------------|
> |--/******/---------------|
> |-/******/----------------|
> |/******/-----------------|
> |\*****/------------------|
> |\\***/-------------------|
> |\\\\/--------------------|
> \\\\\---------------------x
>


The problem with this range is that (for a = half width of the diagonal
stripe) you have r = 1-2*a+a^3 which does not have a nice inverse
function, therefore computing the data for given r is not
straightforward. Furthermore, creating the y's is harder because
the simple trick using %% for uniform random numbers does not allow
to create random numbers with 2 differerent nonzero densities in
two different intervals.



>>
>> There is the same number of stars in each horizontal row and each
>> vertical column.
> 
> I love it! I plotted the data that your method generated and got a plot
> with points in the regions you displayed. After pondering its curious
> pathology (three disjoint regions), I thought I could cure that
> pathology by flipping quadrants. I proceeded to do so but discovered
> that the pathology wasn't really cured but only concentrated at the ends
> and middle. Using your ASCII art, my version  wuld look like this where
> the \\\ regions are "double dense".
> 
> y[x>0.5 & y <0.5] = 0.5 +abs(0.5-y[x>0.5 & y <0.5])
> y[x<0.5 & y >0.5] = 0.5 -abs(0.5-y[x<0.5 & y >0.5])
> plot(x,y)
> 
> x---------------------\\\\x
> |--------------------/*\\\|
> |-------------------/***\\|
> |------------------/*****\|
> |-----------------/******/|
> |----------------/******/-|
> |---------------/******/--|
> |--------------/******/---|
> |-------------/******/----|
> |-------------|\\***/-----|
> |-------------|\\\*/------|
> |-------------|\\\\-------|
> |-------------*-----------|
> |--------/\\\\------------|
> |-------/**\\\------------|
> |------/****\\------------|
> |-----/******\------------|
> |----/******/-------------|
> |---/******/--------------|
> |--/******/---------------|
> |-/******/----------------|
> |/******/-----------------|
> |\*****/------------------|
> |\\***/-------------------|
> |\\\\/--------------------|
> \\\\\---------------------x
> 
> So it not only created a still slightly less pathological counterpart,
> but the correlation jumped from 0.5 to 0.95. It looks to be a promising
> basis for homework problems in probability courses, an experience I have
> never has the ?pleasure? to experience except during self-study to
> repair my (many) mathematical deficiencies.
> 
>> cor(x,y)
> [1] 0.9449256
>



More information about the R-help mailing list