[R] Generating correlated data from uniform distribution
Greg Snow
greg.snow at ihc.com
Tue Jul 5 18:34:39 CEST 2005
Here is an approach using 'optim' and simulated annealing:
x <- sort(runif(1000))
y <- sort(runif(1000))
ord <- 1:1000
target <- function(ord){ ( cor(x, y[ord]) - 0.6 ) ^2 }
new.point <- function(ord){
tmp <- sample(length(ord), 2)
ord[tmp] <- ord[rev(tmp)]
ord
}
new.point2 <- function(ord){
tmp <- sample(length(ord) -100, 1)
tmp2 <- sample(100, 1)
ord[ c(tmp, tmp+tmp2) ] <- ord[ c(tmp+tmp2, tmp) ]
ord
}
res <- optim(ord, target, new.point, method="SANN",
control = list(maxit=6000, temp=2000, trace=TRUE))
res2 <- optim(ord, target, new.point2, method="SANN",
control = list(maxit=60000, temp=200, trace=TRUE))
y <- y[res$par]
par(mfrow=c(2,2))
hist(x)
hist(y)
plot(x,y)
cor(x,y)
y <- sort(y)[res2$par]
par(mfrow=c(2,2))
hist(x)
hist(y)
plot(x,y)
cor(x,y)
Hope this helps,
Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow at ihc.com
(801) 408-8111
>>> "Jim Brennan" <jfbrennan at rogers.com> 07/01/05 05:25PM >>>
OK now I am skeptical especially when you say in a weird way:-)
This may be OK but look at plot(x,y) and I am suspicious. Is it still
alright with this kind of relationship?
For large N it appears Spencer's method is returning slightly lower
correlation for the uniforms as compared to the normals so maybe there is a
problem!?!
Hope we are all learning something and Menghui gets/has what he wants . :-)
-----Original Message-----
From: pd at pubhealth.ku.dk [mailto:pd at pubhealth.ku.dk] On Behalf Of Peter
Dalgaard
Sent: July 1, 2005 6:59 PM
To: Jim Brennan
Cc: 'Tony Plate'; 'Menghui Chen'; r-help at stat.math.ethz.ch
Subject: Re: [R] Generating correlated data from uniform distribution
"Jim Brennan" <jfbrennan at rogers.com> writes:
> Yes you are right I guess this works only for normal data. Free advice
> sometimes comes with too little consideration :-)
Worth every cent...
> Sorry about that and thanks to Spencer for the correct way.
Hmm, but is it? Or rather, what is the relation between the
correlation of the normals and that of the transformed variables?
Looks nontrivial to me.
Incidentally, here's a way that satisfies the criteria, but in a
rather weird way:
N <- 10000
rho <- .6
x <- runif(N, -.5,.5)
y <- x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2))
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list