[R] Correlation matrix removing insignificant R values

Frank Harrell f.harrell at vanderbilt.edu
Wed Nov 23 15:13:09 CET 2011

I think it would be better to think of this as an estimation problem rather
than a selection problem.  If the correlation matrix is of interest,
estimate the entire matrix.  If you want to show that you can make decisions
on the basis of the matrix, then use the bootstrap to get a confidence
interval for quantities of interest.  For example you can bootstrap the rank
of the absolute values of the correlation coefficients to get nonparametric
bootstrap percentile confidence limits for those ranks.  You will be
disappointed in the widths of these intervals, which demonstrate how hard it
is to select winners and losers from non-huge datasets.  For example, the
bootstrap might show that for the apparent highest correlation you can only
be 95% confident that that pair of variables does not possess one of the 10
worst correlations.

mgranlie wrote
> Hello.
> I have a large dataset with sales pr month for 56 products with 10 months
> and i have tried to see how the sales are correlated using 
> cor()
> This has given me a 56X56 matrix with the R value for each product pair.
> Most of these correlations are insignificant, and i want only to retain
> the instances were the R value is significant (for 10 observations it
> should be above 0.64)
> Can someone help with this?

Frank Harrell
Department of Biostatistics, Vanderbilt University
View this message in context: http://r.789695.n4.nabble.com/Correlation-matrix-removing-insignificant-R-values-tp4099412p4099719.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list