[R] resampling for correlation and testing

Thu Mar 29 06:47:49 CEST 2012

On Wed, Mar 28, 2012 at 8:03 PM, Benton, Paul
<hpaul.benton08 at imperial.ac.uk> wrote:
>
> On Mar 29, 2012, at 1:41 AM, ilai wrote:
>
>> On Wed, Mar 28, 2012 at 3:53 PM, Benton, Paul
>> <hpaul.benton08 at imperial.ac.uk> wrote:
>>> Hello all R-er,

 <snip>

>>>
>>> ## Then test if rho.A[1,1] come from the distribution of rho.B[1,1]
>>> pvalueMat[1,1]<-wilcox.test(rho.array[1,1,] , rho.A[1,1])$p.value
>>>
>>
>> From what I know cor(A)[ i , i ] = cor(B)[ j , j ] = 1   for any
>> choice of A,B,i and j
>
> No, cor(a)[i, j] != cor(b)[i , j]

Clearly. I was talking about the diagonal elements (which in a
correlation matrix are 1's for all). I didn't look at your function
yet (I'll try to find the time soon) so this may be a mute point, but
I think here maybe the answer to your question. Correct me if I'm
wrong (it's been known to happen from time to time :) but here is how
I see it:

For corr matrices A (n x n) and B (p x p) you "only" need to test each
of lower.tri.A[i,j] where i != j vs. some populations of rho's which
are resamples from the vector of p(p-1) lower.tri.B => you will be
performing n(n-1)500 tests. Still maybe a daunting task for large n,
but much more manageable than creating the full dim array of matrices,
where the diagonal is meaningless and the other half is just a
"repeat". I know - you actually want 1000 resamples, not 500, and in
your original you remove the diagonal, etc. etc. but as you yourself
noted creating even an empty array of these dimensions was not
feasible.

Hope I'm not way off and this helps somewhat
Elai

>
> If your concern is because they are coming from the same distribution then again this is example data. Even then I would imagine that the correlation would be different for small n. Either way resampling the columns will give different  correlation values. Please try the code yourself.

>
>> I don't think Wilcox intended his test to be used in this way….
> Probably true …. care to suggest a different statistical test ?
>
>>
>> I would start with fixing these issues first so you don't wait 2 days
>> for a vector of NaN's
>
> Actually I didn't get a vector or NaN's I got a matrix of pvalues. Which has proved useful. Now I want to make the function faster and was looking for a bit of help.
>
>>
>> Cheers
>>
>>
>>> However, my array size would be 2300 x 2300 x 500 which R won't let me even make as an empty structure. Any suggestion are more than welcomed !!
>>>
>>> Cheers,
>>>
>>> Paul
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>