[R] correlation matricies: getting p-values?

Bill Venables William.Venables at cmis.CSIRO.AU
Tue Jan 4 06:05:39 CET 2000


> I have to admit that I'm at a bit of a loss here; any pointers would be
> greatly appreciated.
> 
> I've been making correlation matricies from some of my datasets, and
> have been instructed to get the probability values for each of these
> correlations.

To be precise I take it this means the significance probability,
that is the chance of getting a value of the correlation as far
from zero in absolute value or more so as the one you got...

> 
> I've checked the online help for info on both the cor and cov functions,
> but I was unable to find any relevant info on finding how to obtain
> these probability values. 

The connexion between r and t is known to be

    t = r*sqrt(n - 2)/(1 - r^2)

that is, F = t^2 = r^2*(n-2)/(1 - r^2) ~ F(1, n-2)

This allows you to find the probabilities in a line or two of code.
Here is the calculation as a function that puts the correlations
below the diagonal and the significance probabilities above.

cor.prob <- function(X, dfr = nrow(X) - 2) {
	 R <- cor(X)
	 above <- row(R) < col(R)
	 r2 <- R[above]^2
	 Fstat <- r2 * dfr / (1 - r2)
	 R[above] <- 1 - pf(Fstat, 1, dfr)
	 R
}

> X <- matrix(rnorm(1000), 200, 5)
> cor.prob(X)
          [,1]      [,2]     [,3]      [,4]    [,5]
[1,]  1.000000  0.359702 0.872159  0.296850 0.07346
[2,] -0.065106  1.000000 0.111743  0.569198 0.41012
[3,] -0.011450  0.112807 1.000000  0.386409 0.47407
[4,]  0.074129  0.040488 0.061574  1.000000 0.40357
[5,]  0.126853 -0.058560 0.050906 -0.059382 1.00000

You can spot-check the results by regression, for example:

> X <- as.data.frame(X)
> names(X)
[1] "V1" "V2" "V3" "V4" "V5"
> summary(lm(V2 ~ V1, X))

....

Residual standard error: 1 on 198 degrees of freedom
Multiple R-Squared: 0.00424,	Adjusted R-squared: -0.00079 
F-statistic: 0.843 on 1 and 198 degrees of freedom,	p-value: 0.36 

-------

The final p-value compares with  0.359702 in the 1,2 position of
the matrix.

This assumes you have a common degrees of freedom (n-2) for all
correlations in the table.  If you are using something like
"pairwise deletion" of missing values you have a bit more work to
do (and I hope your matrix comes out as positive definite - there
is no guarantee it will).

-- 
-----------------------------------------------------------------
Bill Venables, Statistician, CMIS Environmetrics Project.

Physical address:                            Postal address:
CSIRO Marine Laboratories,                   PO Box 120,       
233 Middle St, Cleveland, Queensland         Cleveland, Qld, 4163
AUSTRALIA                                    AUSTRALIA

Telephone: +61 7 3826 7251     Email: Bill.Venables at cmis.csiro.au     
      Fax: +61 7 3826 7304
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list