[R] Fisher's r to z' transformation - help needed

Mike White mikewhite.diu at btconnect.com
Thu May 24 09:35:19 CEST 2007


Duncan and Peter
[resent to include on R-Help]
Thank you for your help. It seems my data structure is not suitable for use
with Fisher's z' transformation.

The simulated data was intended to represent the outputs of several
instruments over time. Each row of dat represents the output from one
instrument on a particular day and each column represents a variable being
measured.  Each instrument sensitivity is different and may be a small
offset, so that the output is effectively transformed as ax +b where x is
the 'true' output and the values of a and b are not known.  Pearson's r was
therefore used to check the correlation between outputs.  I then want to
plot the r values on a control chart and set an upper warning line and
action line for a maximum acceptable  value for 1-r based on either
comparing each output with every other output or by comparing to a mean of
the outputs.  I was then hoping to use  Fisher's z' transformation to set
the usual warning and action lines based on a single sided normal
distribution.

The only alternative I can think of is to use the simulation to produce the
r distribution and then use the quantile function to set limits based on
probability? I would be grateful for any help and advice you can provide.

Thanks
Mike White

----- Original Message ----- 
From: "Duncan Murdoch" <murdoch at stats.uwo.ca>
To: "Mike White" <mikewhite.diu at btconnect.com>
Cc: <R-help at stat.math.ethz.ch>
Sent: Wednesday, May 23, 2007 1:38 PM
Subject: Re: [R] Fisher's r to z' transformation - help needed


> On 5/23/2007 7:40 AM, Mike White wrote:
> > I am trying to use Fisher's z' transformation of the Pearson's r but the
> > standard error does not appear to be correct.  I have simulated an
example
> > using the R code below.  The z' data appears to have a reasonably normal
> > distribution but the standard error given by the formula 1/sqrt(N-3)
(from
> > http://davidmlane.com/hyperstat/A98696.html) gives a different results
than
> > sd(z).  Can anyone tell me where I am going wrong?
>
> Your simulation is very strange.  Why are you calculating the
> correlation of data with its own mean?
>
> Here's a simpler simulation that seems to confirm the approximation is
> reasonable:
>
>  > p <- 10
>  > sdx <- 1
>  > sdy <- 1
>  > x <- matrix(rnorm(1000*p, sd=sdx), 1000, p)
>  > y <- matrix(rnorm(1000*p, mean=x, sd=sdy), 1000, p)
>
> The true correlation is sdx/sqrt(sdx^2 + sdy^2), i.e. 0.71.
>
>  > r <- numeric(1000)
>  > for (i in 1:1000) r[i] <- cor(x[i,], y[i,])
>  > f <- 0.5*(log(1+r) - log(1-r))
>  > sd(f)
> [1] 0.3739086
>  > 1/sqrt(p-3)
> [1] 0.3779645
>
>  > p <- 5
>  > x <- matrix(rnorm(1000*p, sd=sdx), 1000, p)
>  > y <- matrix(rnorm(1000*p, mean=x, sd=sdy), 1000, p)
>  > r <- numeric(1000)
>  > for (i in 1:1000) r[i] <- cor(x[i,], y[i,])
>  > f <- 0.5*(log(1+r) - log(1-r))
>  > sd(f)
> [1] 0.6571383
>  > 1/sqrt(p-3)
> [1] 0.7071068
>
> Duncan Murdoch
>



More information about the R-help mailing list