[R] how to generate a random correlation matrix with restrictions

Kornelius Rohmeyer kornelius.rohmeyer at small-projects.de
Mon Aug 31 11:57:08 CEST 2009


Hi,

i want to pick up your last method and add that "about 10% of r_ij are
greater than 0.9" could mean for example that there is a cluster of
about sqrt(1000) variables that are highly correlated.

But first let us note, that there is no canonical meaning for the
randomness of a random correlation matrix, so whether this is adequate
depends on the underlying problem/simulation study/test cases.

Instead of drawing random vectors via rnorm, one could draw for example
42 vectors from a multivariate normal distribution with high correlation
in the first 42 components:

require("Matrix")
require("mvtnorm")
n <- 100
n1 <- 42
n2 <- n - n1
R <- bdiag(matrix(0.99,nrow=n1,ncol=n1)+diag(n1)*0.01,diag(n2)*0.01)
R <- as.matrix(R)

random.correlation.matrix <- function() {
  m1 <- rmvnorm(n1,rep(0,n),R)
  m2 <- rmvnorm(n2,rep(0,n),diag(n))
  m <- rbind(m1,m2)
  for (i in 1:n) {
    # normalization - we want a correlation matrix
    m[i,] <- m[i,]/sqrt(t(m[i,])%*%m[i,])
  }
  m%*%t(m)
}

Let us count how many entries have an absolute value greater than 0.9:

count.gt.90 <- function() {
  m <- random.correlation.matrix()
  sum(abs(m)>0.90)
}

> mean(replicate(100, count.gt.90()))
[1] 999.32

hist(replicate(100, count.gt.90()))

Looks well... But remember that you now have every time 42 highly
correlated variables (which are also always the first 42 ones). Perhaps
you want to make this number also random. And perhaps you want two or
three or a random number of different clusters of highly correlated
variables? Depends all on your objective - have fun! :-)

Cheers, Kornelius.

> Yes, Kingsford.  This problem does not appear to be trival.  I am not sure if there is any literature on this.  I have seen a paper by Davies and Higham (BIT 2000) on "Stable generation of correlation matrices.  There is also a paper by Harry Joe on generating correlation matrices with given partial correlation structure.  I am not sure if these papers would be helpful for this problem.  Of course, one can readily cook up an ad-hoc, trial-error procedure to generate the type of matrices that Ning wants.
> 
> By the way, the easiest way to generate a PD matrix is:
> 
> N <- 10
> amat <- matrix(rnorm(N*N), N, N)
> A <- amat %*% t(amat)  # A is PD 
> 
> Best,
> Ravi
> 
> ____________________________________________________________________
> 
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology
> School of Medicine
> Johns Hopkins University
> 
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu
> 
> 
> ----- Original Message -----
> From: Kingsford Jones <kingsfordjones at gmail.com>
> Date: Saturday, August 29, 2009 7:43 pm
> Subject: Re: [R] how to generate a random correlation matrix with restrictions
> To: Ravi Varadhan <rvaradhan at jhmi.edu>
> Cc: Ning Ma <pningma at gmail.com>, r-help at r-project.org
> 
> 
> > Thanks Ravi -- with my limited linear algebra skills I was assuming
> >  invertible symmetric was sufficient (rather than just necessary) for
> >  positive definiteness.  So, the open challenge is to generate a pd
> >  matrix of dimension 100 with r_ij = 1 if i=j else -1< r_ij< 1, with
> >  about 10% of the elements >.9.
> >  
> >  I tried for a bit without luck, but I can offer another way to
> >  generate a pd matrix:
> >  
> >  ev <- runif(100) # choose some positive eigenvalues
> >  U <- svd(matrix(runif(10000), nc=100))$u  # an orthogonal matrix
> >  pdM <- U %*% diag(ev) %*% t(U)
> >  
> >  
> >  Kingsford
> >  
> >  
> >  
> >  On Fri, Aug 28, 2009 at 9:41 PM, Ravi Varadhan<rvaradhan at jhmi.edu> wrote:
> >  > Hi Kingsford,
> >  >
> >  > There is more structure to a correlation matrix than that meets the 
> > eye!  Your method will produce a matrix R that looks "like" a 
> > correlation matrix, but beware - it is an impostor!
> >  >
> >  > You can obtain a valid correlation matrix, Q, from the impostor R 
> > by using the `nearPD' function in the "Matrix" package, which finds 
> > the positive definite matrix Q that is "nearest" to R.  However, note 
> > that when R is far from a positive-definite matrix, this step may give 
> > a Q that does not have the desired property.
> >  >
> >  > require(Matrix)
> >  >
> >  > R <- matrix(runif(16), ncol=4)
> >  >
> >  > R <- (R * lower.tri(R)) + t(R * lower.tri(R))
> >  >
> >  > diag(R) <- 1
> >  >
> >  > eigen(R)$val
> >  >
> >  > Q <- nearPD(R, posd.tol=1.e-04)$mat
> >  >
> >  > eigen(Q)$val
> >  >
> >  > max(abs(Q - R))  # maximum discrepancy between R and Q
> >  >
> >  > Another easy way to produce a valid correlation matrix is:
> >  >
> >  > R <- matrix(runif(36), ncol=6)
> >  >
> >  > RtR <- R %*% t(R)
> >  >
> >  > Q <- cov2cor(RtR)
> >  >
> >  > But this does not have the property that the correlations are 
> > uniformly distributed.
> >  >
> >  > Hope this helps,
> >  > Ravi.
> >  >
> >  > ____________________________________________________________________
> >  >
> >  > Ravi Varadhan, Ph.D.
> >  > Assistant Professor,
> >  > Division of Geriatric Medicine and Gerontology
> >  > School of Medicine
> >  > Johns Hopkins University
> >  >
> >  > Ph. (410) 502-2619
> >  > email: rvaradhan at jhmi.edu
> >  >
> >  >
> >  > ----- Original Message -----
> >  > From: Kingsford Jones <kingsfordjones at gmail.com>
> >  > Date: Friday, August 28, 2009 10:12 pm
> >  > Subject: Re: [R] how to generate a random correlation matrix with   
> >     restrictions
> >  > To: Ning Ma <pningma at gmail.com>
> >  > Cc: r-help at r-project.org
> >  >
> >  >
> >  >> Ahh -- Mark Leeds just pointed out off-list that it was supposed 
> > to be
> >  >> a correlation matrix.
> >  >>
> >  >> Perhaps
> >  >>
> >  >> R <- matrix(runif(10000), ncol=100)
> >  >> R <- (R * lower.tri(R)) + t(R * lower.tri(R))
> >  >> diag(R) <- 1
> >  >>
> >  >> These are of course uniformly distributed positive correlations.
> >  >>
> >  >> Kingsford
> >  >>
> >  >>
> >  >> On Fri, Aug 28, 2009 at 7:36 PM, Kingsford
> >  >> Jones<kingsfordjones at gmail.com> wrote:
> >  >> >  R <- matrix(runif(10000), ncol=100)
> >  >> >
> >  >> > hth,
> >  >> >
> >  >> > Kingsford Jones
> >  >> >
> >  >> > On Fri, Aug 28, 2009 at 7:26 PM, Ning Ma<pningma at gmail.com> wrote:
> >  >> >> Hi,
> >  >> >>
> >  >> >> How can I generate a random 100x100 correlation matrix, R={r_ij},
> >  >> >> where about 10% of r_ij are greater than 0.9
> >  >> >>
> >  >> >> Thanks in advance.
> >  >> >>
> >  >> >> ______________________________________________
> >  >> >> R-help at r-project.org mailing list
> >  >> >>
> >  >> >> PLEASE do read the posting guide
> >  >> >> and provide commented, minimal, self-contained, reproducible code.
> >  >> >>
> >  >> >
> >  >>
> >  >> ______________________________________________
> >  >> R-help at r-project.org mailing list
> >  >>
> >  >> PLEASE do read the posting guide
> >  >> and provide commented, minimal, self-contained, reproducible code.
> >  >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list